Jump to >

reviewboard.diffviewer.diffutils

CHUNK_RANGE_RE = <_sre.SRE_Pattern object>[source]

A regex for matching a diff chunk header.

New in version 3.0.18.

convert_to_unicode(s, encoding_list)[source]

Return the passed string as a unicode object.

If conversion to unicode fails, we try the user-specified encoding, which defaults to ISO 8859-15. This can be overridden by users inside the repository configuration, which gives users repository-level control over file encodings.

Ideally, we’d like to have per-file encodings, but this is hard. The best we can do now is a comma-separated list of things to try.

Returns the encoding type which was used and the decoded unicode object.

Parameters:
  • s (bytes or bytearray or unicode) – The string to convert to Unicode.
  • encoding_list (list of unicode) – The list of encodings to try.
Returns:

A tuple with the following information:

  1. A compatible encoding (unicode).
  2. The Unicode data (unicode).

Return type:

tuple

Raises:
  • TypeError – The provided value was not a Unicode string, byte string, or a byte array.
  • UnicodeDecodeError – None of the encoding types were valid for the provided string.
convert_line_endings(data)[source]

Convert line endings in a file.

Some types of repositories provide files with a single trailing Carriage Return (\r), even if the rest of the file used a CRLF (\r\n) throughout. In these cases, GNU diff will add a \ No newline at end of file to the end of the diff, which GNU patch understands and will apply to files with just a trailing \r.

However, we normalize \r to \n, which breaks GNU patch in these cases. This function works around this by removing the last \r and then converting standard types of newlines to a \n.

This is not meant for use in providing byte-compatible versions of files, but rather to help with comparing lines-for-lines in situations where two versions of a file may come from different platforms with different newlines.

Parameters:data (bytes or unicode) – A string to normalize. This supports either byte strings or Unicode strings.
Returns:The data with newlines converted, in the original string type.
Return type:bytes or unicode
Raises:TypeError – The data argument provided is not a byte string or Unicode string.
split_line_endings(data)[source]

Split a string into lines while preserving all non-CRLF characters.

Unlike str.splitlines(), this will only split on the following character sequences: \n, \r, \r\n, and \r\r\n.

This is needed to prevent the sort of issues encountered with Unicode strings when calling str.splitlines`(), which is that form feed characters would be split. patch and diff accept form feed characters as valid characters in diffs, and doesn’t treat them as newlines, but str.splitlines() will treat it as a newline anyway.

Parameters:data (bytes or unicode) – The data to split into lines.
Returns:The list of lines.
Return type:list of bytes or unicode
patch(diff, orig_file, filename, request=None)[source]

Apply a diff to a file.

This delegates out to patch because noone except Larry Wall knows how to patch.

Parameters:
  • diff (bytes) – The contents of the diff to apply.
  • orig_file (bytes) – The contents of the original file.
  • filename (unicode) – The name of the file being patched.
  • request (django.http.HttpRequest, optional) – The HTTP request, for use in logging.
Returns:

The contents of the patched file.

Return type:

bytes

Raises:

reviewboard.diffutils.errors.PatchError – An error occurred when trying to apply the patch.

get_original_file_from_repo(filediff, request=None, encoding_list=None)[source]

Return the pre-patched file for the FileDiff from the repository.

The parent diff will be applied if it exists.

New in version 4.0.

Parameters:
  • filediff (reviewboard.diffviewer.models.filediff.FileDiff) – The FileDiff to retrieve the pre-patch file for.
  • request (django.http.HttpRequest, optional) – The HTTP request from the client.
  • encoding_list (list of unicode, optional) –

    A custom list of encodings to try when processing the file. This will override the encoding list normally retrieved from the FileDiff and repository.

    If there’s already a known valid encoding for the file, it will be used instead.

    This is here for compatibility and will be removed in Review Board 5.0.

Returns:

The pre-patched file.

Return type:

bytes

Raises:
  • UnicodeDecodeError – The source file was not compatible with any of the available encodings.
  • reviewboard.diffutils.errors.PatchError – An error occurred when trying to apply the patch.
  • reviewboard.scmtools.errors.SCMError – An error occurred while computing the pre-patch file.
get_original_file(filediff, request=None, encoding_list=None)[source]

Return the pre-patch file of a FileDiff.

Changed in version 4.0: The encoding_list parameter should no longer be provided by callers. Encoding lists are now calculated automatically. Passing a custom list will override the calculated one.

Parameters:
  • filediff (reviewboard.diffviewer.models.filediff.FileDiff) – The FileDiff to retrieve the pre-patch file for.
  • request (django.http.HttpRequest, optional) – The HTTP request from the client.
  • encoding_list (list of unicode, optional) –

    A custom list of encodings to try when processing the file. This will override the encoding list normally retrieved from the FileDiff and repository.

    If there’s already a known valid encoding for the file, it will be used instead.

Returns:

The pre-patch file.

Return type:

bytes

Raises:
  • UnicodeDecodeError – The source file was not compatible with any of the available encodings.
  • reviewboard.diffutils.errors.PatchError – An error occurred when trying to apply the patch.
  • reviewboard.scmtools.errors.SCMError – An error occurred while computing the pre-patch file.
get_patched_file(source_data, filediff, request=None)[source]

Return the patched version of a file.

This will normalize the patch, applying any changes needed for the repository, and then patch the provided data with the patch contents.

Parameters:
Returns:

The patched file contents.

Return type:

bytes

get_revision_str(revision)[source]
get_filenames_match_patterns(patterns, filenames)[source]

Return whether any of the filenames match any of the patterns.

This is used to compare a list of filenames to a list of patterns. The patterns are case-sensitive.

Parameters:
  • patterns (list of unicode) – The list of patterns to match against.
  • filename (list of unicode) – The list of filenames.
Returns:

True if any filenames match any patterns. False if none match.

Return type:

bool

get_filediff_encodings(filediff, encoding_list=None)[source]

Return a list of encodings to try for a FileDiff’s source text.

If the FileDiff already has a known encoding stored, then it will take priority. The provided encoding list, or the repository’s list of configured encodingfs, will be provided as fallbacks.

Parameters:
  • filediff (reviewboard.diffviewer.models.filediff.FileDiff) – The FileDiff to return encodings for.
  • encoding_list (list of unicode, optional) – An explicit list of encodings to try. If not provided, the repository’s list of encodings will be used instead (which is generally preferred).
Returns:

The list of encodings to try for the source file.

Return type:

list of unicode

get_matched_interdiff_files(tool, filediffs, interfilediffs)[source]

Generate pairs of matched files for display in interdiffs.

This compares a list of filediffs and a list of interfilediffs, attempting to best match up the files in both for display in the diff viewer.

This will prioritize matches that share a common source filename, destination filename, and new/deleted state. Failing that, matches that share a common source filename are paired off.

Any entries in interfilediffs` that don't have any match in ``filediffs are considered new changes in the interdiff, and any entries in filediffs that don’t have entries in interfilediffs are considered reverted changes.

Parameters:
  • tool (reviewboard.scmtools.core.SCMTool) – The tool used for all these diffs.
  • filediffs (list of reviewboard.diffviewer.models.filediff.FileDiff) – The list of filediffs on the left-hand side of the diff range.
  • interfilediffs (list of reviewboard.diffviewer.models.filediff.FileDiff) – The list of filediffs on the right-hand side of the diff range.
Yields:

tuple – A paired off filediff match. This is a tuple containing two entries, each a FileDiff or None.

get_filediffs_match(filediff1, filediff2)[source]

Return whether two FileDiffs effectively match.

This is primarily checking that the patched version of two files are going to be basically the same.

This will first check that we even have both FileDiffs. Assuming we have both, this will check the diff for equality. If not equal, we at least check that both files were deleted (which is equivalent to being equal).

The patched SHAs are then checked. These would be generated as part of the diff viewing process, so may not be available. We prioritize the SHA256 hashes (introduced in Review Board 4.0), and fall back on SHA1 hashes if not present.

Parameters:
Returns:

True if both FileDiffs effectively match. False if they do not.

Return type:

bool

Raises:

ValueErrorNone was provided for both filediff1 and filediff2.

get_diff_files(diffset, filediff=None, interdiffset=None, interfilediff=None, base_filediff=None, request=None, filename_patterns=None, base_commit=None, tip_commit=None)[source]

Return a list of files that will be displayed in a diff.

This will go through the given diffset/interdiffset, or a given filediff within that diffset, and generate the list of files that will be displayed. This file list will contain a bunch of metadata on the files, such as the index, original/modified names, revisions, associated filediffs/diffsets, and so on.

This can be used along with populate_diff_chunks() to build a full list containing all diff chunks used for rendering a side-by-side diff.

Parameters:
  • diffset (reviewboard.diffviewer.models.diffset.DiffSet) – The diffset containing the files to return.
  • filediff (reviewboard.diffviewer.models.filediff.FileDiff, optional) – A specific file in the diff to return information for.
  • interdiffset (reviewboard.diffviewer.models.diffset.DiffSet, optional) – A second diffset used for an interdiff range.
  • interfilediff (reviewboard.diffviewer.models.filediff.FileDiff, optional) –

    A second specific file in interdiffset used to return information for. This should be provided if filediff and interdiffset are both provided. If it’s None in this case, then the diff will be shown as reverted for this file.

    This may not be provided if base_filediff is provided.

  • base_filediff (reviewbaord.diffviewer.models.filediff.FileDiff, optional) –

    The base FileDiff to use.

    This may only be provided if filediff is provided and interfilediff is not.

  • filename_patterns (list of unicode, optional) – A list of filenames or patterns used to limit the results. Each of these will be matched against the original and modified file of diffs and interdiffs.
  • base_commit (reviewboard.diffviewer.models.diffcommit.DiffCommit, optional) –

    An optional base commit. No FileDiffs from commits before that commit will be included in the results.

    This argument only applies to DiffSets with DiffCommits.

  • tip_commit (reviewboard.diffviewer.models.diffcommit.DiffSet, optional) –

    An optional tip commit. No FileDiffs from commits after that commit will be included in the results.

    This argument only applies to DiffSets with DiffCommits.

Returns:

A list of dictionaries containing information on the files to show in the diff, in the order in which they would be shown.

Return type:

list of dict

populate_diff_chunks(files, enable_syntax_highlighting=True, request=None)[source]

Populates a list of diff files with chunk data.

This accepts a list of files (generated by get_diff_files) and generates diff chunk data for each file in the list. The chunk data is stored in the file state.

get_file_from_filediff(context, filediff, interfilediff)[source]

Return the files that corresponds to the filediff/interfilediff.

This is primarily intended for use with templates. It takes a RequestContext for looking up the user and for caching file lists, in order to improve performance and reduce lookup times for files that have already been fetched.

This function returns either exactly one file or None.

get_last_line_number_in_diff(context, filediff, interfilediff)[source]

Determine the last virtual line number in the filediff/interfilediff.

This returns the virtual line number to be used in expandable diff fragments.

get_last_header_before_line(context, filediff, interfilediff, target_line)[source]

Get the last header that occurs before the given line.

This returns a dictionary of left header and right header. Each header is either None or a dictionary with the following fields:

Field Description
line Virtual line number (union of the original and patched files)
text The header text
get_file_chunks_in_range(context, filediff, interfilediff, first_line, num_lines)[source]

Generate the chunks within a range of lines in the specified filediff.

This is primarily intended for use with templates. It takes a RequestContext for looking up the user and for caching file lists, in order to improve performance and reduce lookup times for files that have already been fetched.

See get_chunks_in_range() for information on the returned state of the chunks.

get_chunks_in_range(chunks, first_line, num_lines)[source]

Generate the chunks within a range of lines of a larger list of chunks.

This takes a list of chunks, computes a subset of those chunks from the line ranges provided, and generates a new set of those chunks.

Each returned chunk is a dictionary with the following fields:

Variable Description
change The change type (“equal”, “replace”, “insert”, “delete”)
numlines The number of lines in the chunk.
lines The list of lines in the chunk.
meta A dictionary containing metadata on the chunk

Each line in the list of lines is an array with the following data:

Index Description
0 Virtual line number (union of the original and patched files)
1 Real line number in the original file
2 HTML markup of the original file
3 Changed regions of the original line (for “replace” chunks)
4 Real line number in the patched file
5 HTML markup of the patched file
6 Changed regions of the patched line (for “replace” chunks)
7 True if line consists of only whitespace changes
get_enable_highlighting(user)[source]
get_line_changed_regions(oldline, newline)[source]

Returns regions of changes between two similar lines.

get_sorted_filediffs(filediffs, key=None)[source]

Sorts a list of filediffs.

The list of filediffs will be sorted first by their base paths in ascending order.

Within a base path, they’ll be sorted by base name (minus the extension) in ascending order.

If two files have the same base path and base name, we’ll sort by the extension in descending order. This will make *.h sort ahead of *.c/*.cpp, for example.

If the list being passed in is actually not a list of FileDiffs, it must provide a callable key parameter that will return a FileDiff for the given entry in the list. This will only be called once per item.

get_displayed_diff_line_ranges(chunks, first_vlinenum, last_vlinenum)[source]

Return the displayed line ranges based on virtual line numbers.

This takes the virtual line numbers (the index in the side-by-side diff lines) and returns the human-readable line numbers, the chunks they’re in, and mapped virtual line numbers.

A virtual line range may start or end in a chunk not containing displayed line numbers (such as an “original” range starting/ending in an “insert” chunk). The resulting displayed line ranges will exclude these chunks.

Parameters:
  • chunks (list of dict) – The list of chunks for the diff.
  • first_vlinenum (int) – The first virtual line number. This uses 1-based indexes.
  • last_vlinenum (int) – The last virtual line number. This uses 1-based indexes.
Returns:

A tuple of displayed line range information, containing 2 items.

Each item will either be a dictionary of information, or None if there aren’t any displayed lines to show.

The dictionary contains the following keys:

display_range:

A tuple containing the displayed line range.

virtual_range:

A tuple containing the virtual line range that display_range maps to.

chunk_range:

A tuple containing the beginning/ending chunks that display_range maps to.

Return type:

tuple

Raises:

ValueError – The range provided was invalid.

get_diff_data_chunks_info(diff)[source]

Return information on each chunk in a diff.

This will scan through a unified diff file, looking for each chunk in the diff and returning information on their ranges and lines of context. This can be used to generate statistics on diffs and help map changed regions in diffs to lines of source files.

New in version 3.0.18.

Parameters:diff (bytes) – The diff data to scan.
Returns:A list of chunk information dictionaries. Each entry has an orig and modified dictionary containing the following keys:
chunk_start (int):
The starting line number of the chunk shown in the diff, including any lines of context. This is 0-based.
chunk_len (int):
The length of the chunk shown in the diff, including any lines of context.
changes_start (int):
The starting line number of a range of changes shown in a chunk in the diff. This is after any lines of context and is 0-based.
changes_len (int):
The length of the changes shown in a chunk in the diff, excluding any lines of context.
pre_lines_of_context (int):
The number of lines of context before any changes in a chunk. If the chunk doesn’t have any changes, this will contain all lines of context otherwise shown around changes in the other region in this entry.
post_lines_of_context (int):
The number of lines of context after any changes in a chunk. If the chunk doesn’t have any changes, this will be 0.
Return type:list of dict
check_diff_size(diff_file, parent_diff_file=None)[source]

Check the size of the given diffs against the maximum allowed size.

If either of the provided diffs are too large, an exception will be raised.

Parameters:
Raises:

reviewboard.diffviewer.errors.DiffTooBigError – The supplied files are too big.

get_total_line_counts(files_qs)[source]

Return the total line counts of all given FileDiffs.

Parameters:files_qs (django.db.models.query.QuerySet) – The queryset descripting the FileDiffs.
Returns:A dictionary with the following keys:
  • raw_insert_count
  • raw_delete_count
  • insert_count
  • delete_count
  • replace_count
  • equal_count
  • total_line_count

Each entry maps to the sum of that line count type for all FileDiffs.

Return type:dict