Jump to >

reviewboard.diffviewer.parser

class ParsedDiff(parser, uses_commit_ids_as_revisions=False)[source]

Parsed information from a diff.

This stores information on the diff as a whole, along with a list of commits made to the diff and a list of files within each.

Extra data can be stored by the parser, which will be made available in DiffSet.extra_data.

This is flexible enough to accommodate a variety of diff formats, including DiffX files.

This class is meant to be used internally and by subclasses of BaseDiffParser.

New in version 4.0.5.

changes

The list of changes parsed in this diff. There should always be at least one.

Type

list of ParsedDiffChange

extra_data

Extra data to store along with the information on the diff. The contents will be stored directly in DiffSet.extra_data.

Type

dict

parser

The diff parser that parsed this file.

Type

BaseDiffParser

uses_commit_ids_as_revisions

Whether commit IDs are used as file revisions.

A commit ID will be used if an explicit revision isn’t available for a file. For instance, if a parent diff is available, and a file isn’t present in the parent diff, the file will use the parent diff’s parent commit ID as the parent revision.

Type

bool

class ParsedDiffChange(parsed_diff)[source]

Parsed change information from a diff.

This stores information on a change to a tree, consisting of a set of parsed files and extra data to store (in DiffCommit.extra_data.

This will often map to a commit, or just a typical collection of files in a diff. Traditional diffs will have only one of these. DiffX files may have many (but for the moment, only diffs with a single change can be handled when processing these results).

New in version 4.0.5.

extra_data

Extra data to store along with the information on the change. The contents will be stored directly in DiffCommit.extra_data.

Type

dict

files

The list of files parsed for this change. There should always be at least one.

Type

list of ParsedDiffFile

commit_id

The ID of the commit, parsed from the diff.

This may be None.

Type:

unicode

parent_commit_id

The ID of the primary parent commit, parsed from the diff.

This may be None.

Type:

unicode

property parent_parsed_diff[source]

The parent diff object.

Type:

ParsedDiff

class ParsedDiffFile(parser=None, parsed_diff_change=None, **kwargs)[source]

A parsed file from a diff.

This stores information on a single file represented in a diff, including the contents of that file’s diff, as parsed by DiffParser or one of its subclasses.

Parsers should set the attributes on this based on the contents of the diff, and should add any data found in the diff.

This class is meant to be used internally and by subclasses of BaseDiffParser.

Changed in version 4.0.6: Added old_symlink_target and py:attr:new_symlink_target.

Changed in version 4.0.5: Diff parsers that manually construct instances must pass in parsed_diff_change instead of parser when constructing the object, and must call discard() after construction if the file isn’t wanted in the results.

binary

Whether this represents a binary file.

Type

bool

copied

Whether this represents a file that has been copied. The file may or may not be modified in the process.

Type

bool

deleted

Whether this represents a file that has been deleted.

Type

bool

delete_count

The number of delete (-) lines found in the file.

Type

int

insert_count

The number of insert (+) lines found in the file.

Type

int

Whether this represents a file that is a symbolic link to another file.

Type

bool

moved

Whether this represents a file that has been moved/renamed. The file may or may not be modified in the process.

Type

bool

parser

The diff parser that parsed this file.

Type

BaseDiffParser

skip

Whether this file should be skipped by the parser. If any of the parser methods set this, the file will stop parsing and will be excluded from results.

Type

bool

orig_filename

The parsed original name of the file.

Type:

bytes

orig_file_details

The parsed file details of the original file.

This will usually be a revision.

Type:

bytes or reviewboard.scmtools.core.Revision

modified_filename

The parsed modified name of the file.

This may be the same as orig_filename.

Type:

bytes

modified_file_details

The parsed file details of the modified file.

This will usually be a revision.

Type:

bytes or reviewboard.scmtools.core.Revision

index_header_value

The parsed value for an Index header.

If present in the diff, this usually contains a filename, but may contain other content as well, depending on the variation of the diff format.

Type:

bytes

The old target for a symlink.

New in version 4.0.6.

Type:

bytes

The new target for a symlink.

New in version 4.0.6.

Type:

bytes

old_unix_mode

The old UNIX mode for the file.

New in version 4.0.6.

Type:

int

new_unix_mode

The new UNIX mode for the file.

New in version 4.0.6.

Type:

int

origFile

The parsed original name of the file.

Deprecated since version 4.0: Use orig_filename instead.

origInfo

The parsed file details of the original file.

Deprecated since version 4.0: Use orig_file_details instead.

newFile

The parsed original name of the file.

Deprecated since version 4.0: Use modified_filename instead.

newInfo

The parsed file details of the modified file.

Deprecated since version 4.0: Use modified_file_details instead.

index

The parsed value for an Index header.

Deprecated since version 4.0: Use index_header_value instead.

property parent_parsed_diff_change[source]

The parent change object.

New in version 4.0.5.

Type:

ParsedDiffChange

set(key, value)[source]

Set information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to set attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters
  • key (str) – The key to set.

  • value (object) – The value to set.

get(key, default=None)[source]

Return information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to access attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters
  • key (str) – The key to retrieve.

  • default (object, optional) – The default value to return.

Returns

The resulting value.

Return type

object

update(items)[source]

Update information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to set individual attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters

items (dict) – The keys and values to set.

property data[source]

The data for this diff.

This must be accessed after finalize() has been called.

discard()[source]

Discard this from the parent change.

This will remove it from the list of files. It’s intended for use when a diff parser is populating the diff but then determines the file is no longer needed.

New in version 4.0.5.

finalize()[source]

Finalize the parsed diff.

This makes the diff data available to consumers and closes the buffer for writing.

prepend_data(data)[source]

Prepend data to the buffer.

Parameters

data (bytes) – The data to prepend.

append_data(data)[source]

Append data to the buffer.

Parameters

data (bytes) – The data to append.

class BaseDiffParser(data, uses_commit_ids_as_revisions=False)[source]

Base class for a diff parser.

This is a low-level, basic foundational interface for a diff parser. It performs type checking of the incoming data and a couple of methods for subclasses to implement.

Most SCM implementations will want to either subclass DiffParser or use DiffXParser.

New in version 4.0.5.

data

The diff data being parsed.

Type

bytes

uses_commit_ids_as_revisions

Whether commit IDs are used as file revisions.

See ParsedDiff.uses_commit_ids_as_revisions.

Type

bool

parse_diff()[source]

Parse the diff.

This will parse the content of the file, returning a representation of the diff file and its content.

This must be implemented by subclasses.

Returns

The resulting parsed diff information.

Return type

ParsedDiff

Raises
raw_diff(diffset_or_commit)[source]

Return a raw diff as a string.

This takes a DiffSet or DiffCommit and generates a new, single diff file that represents all the changes made. It’s used to regenerate a diff and serve it up for other tools or processes to use.

This must be implemented by subclasses.

Parameters

diffset_or_commit (reviewboard.diffviewer.models.diffset.DiffSet or reviewboard.diffviewer.models.diffcommit.DiffCommit) –

The DiffSet or DiffCommit to render.

If passing in a DiffSet, only the cumulative diff’s file contents will be returned.

If passing in a DiffCommit, only that commit’s file contents will be returned.

Returns

The diff composed of all the component FileDiffs.

Return type

bytes

Raises
  • NotImplementedError – This wasn’t implemented by a subclass.

  • TypeError – The provided diffset_or_commit wasn’t of a supported type.

normalize_diff_filename(filename)[source]

Normalize filenames in diffs.

This returns a normalized filename suitable for populating in FileDiff.source_file or FileDiff.dest_file, or for when presenting a filename to the UI.

By default, this strips off any leading slashes, which might occur due to differences in various diffing methods or APIs.

Subclasses can override this to provide additional methods of normalization.

Parameters

filename (unicode) – The filename to normalize.

Returns

The normalized filename.

Return type

unicode

class DiffParser(data, **kwargs)[source]

Parses diff files, allowing subclasses to specialize parsing behavior.

This class provides the base functionality for parsing Unified Diff files. It looks for common information present in many variations of diffs, such as Index: lines, in order to extract files and their modified content from a diff.

Subclasses can extend the parsing behavior to extract additional metadata or handle special representations of changes. They may want to override the following methods:

INDEX_SEP = b'==================================================================='[source]

A separator string below an Index header.

This is commonly found immediately below an Index: header, meant to help locate the beginning of the metadata or changes made to a file.

Its presence and location is not guaranteed.

parse_diff()[source]

Parse the diff.

Subclasses should override this if working with a diff format that extracts more than one change from a diff.

New in version 4.0.5: Historically, parse() was the main method used to parse a diff. That’s now used exclusively to parse a list of files for the default parsed_diff_change. The old method is around for compatibility, but is no longer called directly outside of this class.

Returns

The resulting parsed diff information.

Return type

ParsedDiff

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing part of the diff. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse()[source]

Parse the diff and return a list of files.

This will parse the content of the file, returning any files that were found.

Version Change:

4.0.5: Historically, this was the main method used to parse a diff. It’s now used exclusively to parse a list of files for the default parsed_diff_change, and parse_diff() is the main method used to parse a diff. This method is around for compatibility, but is no longer called directly outside of this class.

Returns

The resulting list of files.

Return type

list of ParsedDiffFile

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing part of the diff. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_diff_line(linenum, parsed_file)[source]

Parse a line of data in a diff.

This will append the line to the parsed file’s data, and if the content represents active changes to a file, its insert/delete counts will be updated to reflect them.

Parameters
  • linenum (int) – The 0-based line number.

  • parsed_file (ParsedDiffFile) – The current parsed diff file info.

Returns

The next line number to parse.

Return type

int

parse_change_header(linenum)[source]

Parse a header before a change to a file.

This will attempt to parse the following information, starting at the specified line in the diff:

  1. Any special file headers (such as Index: lines) through parse_special_header()

  2. A standard Unified Diff file header (through parse_diff_header())

  3. Any content after the header (through parse_after_headers())

If the special or diff headers are able to populate the original and modified filenames and revisions/file details, and none of the methods above mark the file as skipped (by setting ParsedDiffFile.skip), then this will finish by appending all parsed data and returning a parsed file entry.

Subclasses that need to control parsing logic should override one or more of the above methods.

Parameters

linenum (int) – The line number to begin parsing.

Returns

A tuple containing the following:

  1. The next line number to parse

  2. The populated ParsedDiffFile instance for this file

Return type

tuple

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the change header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_special_header(linenum, parsed_file)[source]

Parse a special diff header marking the start of a new file’s info.

This attempts to locate an Index: line at the specified line number, which usually indicates the beginning of file’s information in a diff (for Unified Diff variants that support it). By default, this method expects the line to be found at linenum.

If present, the value found immediately after the Index: will be stored in ParsedDiffFile.index_header_value, allowing subclasses to make a determination based on its contents (which may vary between types of diffs, but should include at least a filename.

If the Index: line is not present, this won’t do anything by default.

Subclasses can override this to parse additional information before the standard diff header. They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters
  • linenum (int) – The line number to begin parsing.

  • parsed_file (ParsedDiffFile) – The file currently being parsed.

Returns

The next line number to parse.

Return type

int

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the special header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_diff_header(linenum, parsed_file)[source]

Parse a standard header before changes made to a file.

This attempts to parse the --- (original) and +++ (modified) file lines, which are usually present right before any changes to the file. By default, this method expects the --- line to be found at linenum.

If found, this will populate ParsedDiffFile.orig_filename, ParsedDiffFile.orig_file_details, ParsedDiffFile.modified_filename, and ParsedDiffFile.modified_file_details.

This calls out to parse_filename_header() to help parse the contents immediately after the --- or +++.

Subclasses can override this to parse these lines differently, or to to process the results of these lines (such as converting special filenames to states like “deleted” or “new file”). They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters
  • linenum (int) – The line number to begin parsing.

  • parsed_file (ParsedDiffFile) – The file currently being parsed.

Returns

The next line number to parse.

Return type

int

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_after_headers(linenum, parsed_file)[source]

Parse information after a diff header but before diff data.

This attempts to parse the information found after parse_diff_headers() is called, but before gathering any lines that are part of the diff contents. It’s intended for the few diff formats that may place content at this location.

By default, this does nothing.

Subclasses can override this to provide custom parsing of any lines that may exist here. They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters
  • linenum (int) – The line number to begin parsing.

  • parsed_file (ParsedDiffFile) – The file currently being parsed.

Returns

The next line number to parse.

Return type

int

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_filename_header(s, linenum)[source]

Parse the filename found in a diff filename line.

This parses the value after a --- or +++ indicator (or a special variant handled by a subclass), normalizing the filename and any following file details, and returning both for processing and storage.

Often times, the file details will be a revision for the original file, but this is not guaranteed, and is up to the variation of the diff format.

By default, this will assume that a filename and file details are separated by either a single tab, or two or more spaces. If neither are found, this will fail to parse.

This must parse only the provided value, and cannot parse subsequent lines.

Subclasses can override this behavior to parse these lines another way, or to normalize filenames (handling escaping or filenames with spaces as needed by that particular diff variation).

Parameters
  • s (bytes) – The value to parse.

  • linenum (int) – The line number containing the value to parse.

Returns

A tuple containing:

  1. The filename (as bytes)

  2. The additional file information (as bytes)

Return type

tuple

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

raw_diff(diffset_or_commit)[source]

Return a raw diff as a string.

This takes a DiffSet or DiffCommit and generates a new, single diff file that represents all the changes made. It’s used to regenerate a diff and serve it up for other tools or processes to use.

Subclasses can override this to provide any special logic for building the diff.

Parameters

diffset_or_commit (reviewboard.diffviewer.models.diffset.DiffSet or reviewboard.diffviewer.models.diffcommit.DiffCommit) –

The DiffSet or DiffCommit to render.

If passing in a DiffSet, only the cumulative diff’s file contents will be returned.

If passing in a DiffCommit, only that commit’s file contents will be returned.

Returns

The diff composed of all the component FileDiffs.

Return type

bytes

Raises

TypeError – The provided diffset_or_commit wasn’t of a supported type.

get_orig_commit_id()[source]

Return the commit ID of the original revision for the diff.

By default, this returns None. Subclasses would override this if they work with repositories that always look up changes to a file by the ID of the commit that made the changes instead of a per-file revision or ID.

Non-None values returned by this method will override the values being stored in FileDiff.source_revision.

Implementations would likely want to parse out the commit ID from some prior header and return it here. By the time this is called, all files will have been parsed already.

Returns

The commit ID used to override the source revision of any created FileDiff instances.

Return type

bytes

class DiffXParser(data, uses_commit_ids_as_revisions=False)[source]

Parser for DiffX files.

This will parse files conforming to the DiffX standard, storing the diff content provided in each file section, as well as all the information avalable in each DiffX section (options, preamble, metadata) as extra_data. This allows the diffs to be re-built on download.

This parser is sufficient for most any DiffX need, but subclasses can be created that augment the stored extra_data for any of the parsed objects.

New in version 4.0.5: This is experimental in 4.0.x, with plans to make it stable for 5.0. The API may change during this time.

parse_diff()[source]

Parse the diff.

This will parse the content of the DiffX file, returning a representation of the diff file and its content.

Returns

The resulting parsed diff information.

Return type

ParsedDiff

Raises

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing part of the diff. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

raw_diff(diffset_or_commit)[source]

Return a raw diff as a string.

This takes a DiffSet or DiffCommit and generates a new, single DiffX file that represents all the changes made, based on the previously-stored DiffX information in extra_data dictionaries. It’s used to regenerate a DiffX and serve it up for other tools or processes to use.

Parameters

diffset_or_commit (reviewboard.diffviewer.models.diffset.DiffSet or reviewboard.diffviewer.models.diffcommit.DiffCommit) –

The DiffSet or DiffCommit to render.

If passing in a DiffSet, the full uploaded DiffX file contents will be returned.

If passing in a DiffCommit, a new DiffX representing only that commit’s contents will be returned. This will lack the main preamble or metadata, or any other changes previously in the DiffX file.

Returns

The resulting DiffX file contents.

Return type

bytes

Raises

TypeError – The provided diffset_or_commit value wasn’t of a supported type.