Jump to >

reviewboard.diffviewer.parser

class ParsedDiffFile(parser=None)[source]

Bases: object

A parsed file from a diff.

This stores information on a single file represented in a diff, including the contents of that file’s diff, as parsed by DiffParser or one of its subclasses.

Parsers should set the attributes on this based on the contents of the diff, and should add any data found in the diff.

This class is meant to be used internally and by subclasses of DiffParser.

binary

Whether this represents a binary file.

Type:bool
copied

Whether this represents a file that has been copied. The file may or may not be modified in the process.

Type:bool
deleted

Whether this represents a file that has been deleted.

Type:bool
delete_count

The number of delete (-) lines found in the file.

Type:int
insert_count

The number of insert (+) lines found in the file.

Type:int

Whether this represents a file that is a symbolic link to another file.

Type:bool
moved

Whether this represents a file that has been moved/renamed. The file may or may not be modified in the process.

Type:bool
parser

The diff parser that parsed this file.

Type:DiffParser
skip

Whether this file should be skipped by the parser. If any of the parser methods set this, the file will stop parsing and will be excluded from results.

Type:bool
orig_filename

The parsed original name of the file.

Type:
bytes
orig_file_details

The parsed file details of the original file.

This will usually be a revision.

Type:
bytes or reviewboard.scmtools.core.Revision
modified_filename

The parsed modified name of the file.

This may be the same as orig_filename.

Type:
bytes
modified_file_details

The parsed file details of the modified file.

This will usually be a revision.

Type:
bytes or reviewboard.scmtools.core.Revision
index_header_value

The parsed value for an Index header.

If present in the diff, this usually contains a filename, but may contain other content as well, depending on the variation of the diff format.

Type:
bytes
origFile

The parsed original name of the file.

Deprecated since version 4.0: Use orig_filename instead.

origInfo

The parsed file details of the original file.

Deprecated since version 4.0: Use orig_file_details instead.

newFile

The parsed original name of the file.

Deprecated since version 4.0: Use modified_filename instead.

newInfo

The parsed file details of the modified file.

Deprecated since version 4.0: Use modified_file_details instead.

index

The parsed value for an Index header.

Deprecated since version 4.0: Use index_header_value instead.

__init__(parser=None)[source]

Initialize the parsed file information.

Parameters:parser (reviewboard.diffviewer.parser.DiffParser, optional) – The diff parser that parsed this file.
__setitem__(key, value)[source]

Set information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to set attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:
  • key (str) – The key to set.
  • value (object) – The value to set.
__getitem__(key)[source]

Return information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to access attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:key (str) – The key to retrieve.
Returns:The resulting value.
Return type:object
Raises:KeyError – The key is invalid.
__contains__(key)[source]

Return whether an old parsed file key has been explicitly set.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to check attribute values instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:key (str) – The key to check.
Returns:True if the key has been explicitly set by a diff parser. False if it has not.
Return type:bool
set(key, value)[source]

Set information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to set attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:
  • key (str) – The key to set.
  • value (object) – The value to set.
get(key, default=None)[source]

Return information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to access attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:
  • key (str) – The key to retrieve.
  • default (object, optional) – The default value to return.
Returns:

The resulting value.

Return type:

object

update(items)[source]

Update information on the parsed file from a diff.

This is a legacy implementation used to help diff parsers retain compatibility with the old dictionary-based ways of setting parsed file information. Callers should be updated to set individual attributes instead.

Deprecated since version 4.0: This will be removed in Review Board 5.0.

Parameters:items (dict) – The keys and values to set.
data[source]

The data for this diff.

This must be accessed after finalize() has been called.

finalize()[source]

Finalize the parsed diff.

This makes the diff data available to consumers and closes the buffer for writing.

prepend_data(data)[source]

Prepend data to the buffer.

Parameters:data (bytes) – The data to prepend.
append_data(data)[source]

Append data to the buffer.

Parameters:data (bytes) – The data to append.
class DiffParser(data)[source]

Bases: object

Parses diff files, allowing subclasses to specialize parsing behavior.

This class provides the base functionality for parsing Unified Diff files. It looks for common information present in many variations of diffs, such as Index: lines, in order to extract files and their modified content from a diff.

Subclasses can extend the parsing behavior to extract additional metadata or handle special representations of changes. They may want to override the following methods:

INDEX_SEP = '==================================================================='[source]

A separator string below an Index header.

This is commonly found immediately below an Index: header, meant to help locate the beginning of the metadata or changes made to a file.

Its presence and location is not guaranteed.

__init__(data)[source]

Initialize the parser.

Parameters:data (bytes) – The diff content to parse.
Raises:TypeError – The provided data argument was not a bytes type.
parse()[source]

Parse the diff.

This will parse the content of the file, returning any files that were found.

Returns:The resulting list of files.
Return type:list of ParsedDiffFile
Raises:reviewboard.diffviewer.errors.DiffParserError – There was an error parsing part of the diff. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.
parse_diff_line(linenum, parsed_file)[source]

Parse a line of data in a diff.

This will append the line to the parsed file’s data, and if the content represents active changes to a file, its insert/delete counts will be updated to reflect them.

Parameters:
  • linenum (int) – The 0-based line number.
  • parsed_file (ParsedDiffFile) – The current parsed diff file info.
Returns:

The next line number to parse.

Return type:

int

parse_change_header(linenum)[source]

Parse a header before a change to a file.

This will attempt to parse the following information, starting at the specified line in the diff:

  1. Any special file headers (such as Index: lines) through parse_special_header()
  2. A standard Unified Diff file header (through parse_diff_header())
  3. Any content after the header (through parse_after_headers())

If the special or diff headers are able to populate the original and modified filenames and revisions/file details, and none of the methods above mark the file as skipped (by setting ParsedDiffFile.skip), then this will finish by appending all parsed data and returning a parsed file entry.

Subclasses that need to control parsing logic should override one or more of the above methods.

Parameters:linenum (int) – The line number to begin parsing.
Returns:A tuple containing the following:
  1. The next line number to parse
  2. The populated ParsedDiffFile instance for this file
Return type:tuple
Raises:reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the change header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.
parse_special_header(linenum, parsed_file)[source]

Parse a special diff header marking the start of a new file’s info.

This attempts to locate an Index: line at the specified line number, which usually indicates the beginning of file’s information in a diff (for Unified Diff variants that support it). By default, this method expects the line to be found at linenum.

If present, the value found immediately after the Index: will be stored in ParsedDiffFile.index_header_value, allowing subclasses to make a determination based on its contents (which may vary between types of diffs, but should include at least a filename.

If the Index: line is not present, this won’t do anything by default.

Subclasses can override this to parse additional information before the standard diff header. They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters:
  • linenum (int) – The line number to begin parsing.
  • parsed_file (ParsedDiffFile) – The file currently being parsed.
Returns:

The next line number to parse.

Return type:

int

Raises:

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the special header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_diff_header(linenum, parsed_file)[source]

Parse a standard header before changes made to a file.

This attempts to parse the --- (original) and +++ (modified) file lines, which are usually present right before any changes to the file. By default, this method expects the --- line to be found at linenum.

If found, this will populate ParsedDiffFile.orig_filename, ParsedDiffFile.orig_file_details, ParsedDiffFile.modified_filename, and ParsedDiffFile.modified_file_details.

This calls out to parse_filename_header() to help parse the contents immediately after the --- or +++.

Subclasses can override this to parse these lines differently, or to to process the results of these lines (such as converting special filenames to states like “deleted” or “new file”). They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters:
  • linenum (int) – The line number to begin parsing.
  • parsed_file (ParsedDiffFile) – The file currently being parsed.
Returns:

The next line number to parse.

Return type:

int

Raises:

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_after_headers(linenum, parsed_file)[source]

Parse information after a diff header but before diff data.

This attempts to parse the information found after parse_diff_headers() is called, but before gathering any lines that are part of the diff contents. It’s intended for the few diff formats that may place content at this location.

By default, this does nothing.

Subclasses can override this to provide custom parsing of any lines that may exist here. They may also set ParsedFileDiff.skip to skip the rest of this file and begin parsing a new entry at the returned line number.

Parameters:
  • linenum (int) – The line number to begin parsing.
  • parsed_file (ParsedDiffFile) – The file currently being parsed.
Returns:

The next line number to parse.

Return type:

int

Raises:

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

parse_filename_header(s, linenum)[source]

Parse the filename found in a diff filename line.

This parses the value after a --- or +++ indicator (or a special variant handled by a subclass), normalizing the filename and any following file details, and returning both for processing and storage.

Often times, the file details will be a revision for the original file, but this is not guaranteed, and is up to the variation of the diff format.

By default, this will assume that a filename and file details are separated by either a single tab, or two or more spaces. If neither are found, this will fail to parse.

This must parse only the provided value, and cannot parse subsequent lines.

Subclasses can override this behavior to parse these lines another way, or to normalize filenames (handling escaping or filenames with spaces as needed by that particular diff variation).

Parameters:
  • s (bytes) – The value to parse.
  • linenum (int) – The line number containing the value to parse.
Returns:

A tuple containing:

  1. The filename (as bytes)
  2. The additional file information (as bytes)

Return type:

tuple

Raises:

reviewboard.diffviewer.errors.DiffParserError – There was an error parsing the diff header. This may be a corrupted diff, or an error in the parsing implementation. Details are in the error message.

raw_diff(diffset_or_commit)[source]

Return a raw diff as a string.

This takes a DiffSet or DiffCommit and generates a new, single diff file that represents all the changes made. It’s used to regenerate a diff and serve it up for other tools or processes to use.

Subclasses can override this to provide any special logic for building the diff.

Parameters:diffset_or_commit (reviewboard.diffviewer.models.diffset.DiffSet or reviewboard.diffviewer.models.diffcommit.DiffCommit) –

The DiffSet or DiffCommit to render.

If passing in a DiffSet, only the cumulative diff’s file contents will be returned.

If passing in a DiffCommit, only that commit’s file contents will be returned.

Returns:The diff composed of all the component FileDiffs.
Return type:bytes
Raises:TypeError – The provided diffset_or_commit wasn’t of a supported type.
get_orig_commit_id()[source]

Return the commit ID of the original revision for the diff.

By default, this returns None. Subclasses would override this if they work with repositories that always look up changes to a file by the ID of the commit that made the changes instead of a per-file revision or ID.

Non-None values returned by this method will override the values being stored in FileDiff.source_revision.

Implementations would likely want to parse out the commit ID from some prior header and return it here. By the time this is called, all files will have been parsed already.

Returns:The commit ID used to override the source revision of any created FileDiff instances.
Return type:bytes
normalize_diff_filename(filename)[source]

Normalize filenames in diffs.

This returns a normalized filename suitable for populating in FileDiff.source_file or FileDiff.dest_file, or for when presenting a filename to the UI.

By default, this strips off any leading slashes, which might occur due to differences in various diffing methods or APIs.

Subclasses can override this to provide additional methods of normalization.

Parameters:filename (unicode) – The filename to normalize.
Returns:The normalized filename.
Return type:unicode