4596: Error displaying diff for utf-8 files on Perforce

yaqian
david
david

What version are you running?

ReviewBoard 2.5.13.1
RBTools 0.7.10
p4 executable and pypython package 2017.1
Perforce Server 2015.2

What's the URL of the page containing the problem?

https://<review-board-server-name>/r/203074/diff/1

What steps will reproduce the problem?

  1. Find an utf-8 file from the perforce server, make some changes
  2. Use the rbt tool to post the changes to reviewboard
  3. Go to reviewboard to look at the diff, it will show "There was an error displaying this diff."

What is the expected output? What do you see instead?

Expected: A visible diff view
Actual result: There was an error displaying this diff.

Error message:

The patch to '//user/yaqian/tmp/utf8files/demo.xml' didn't apply cleanly. The temporary files have been left in '/tmp/reviewboard.kMIK0e' for debugging purposes.
patch returned:

 This may be a bug in the software, a temporary outage, or an issue
 with the format of your diff.

 Please try again, and if you still have trouble,
 contact support.

Details

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/views.py", line 299, in get
response = renderer.render_to_response(request)
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/renderers.py", line 56, in render_to_response
return HttpResponse(self.render_to_string(request))
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/renderers.py", line 74, in render_to_string
large_data=True)
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 298, in cache_memoize
compress_large_data))
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 252, in cache_memoize_iter
items = items_or_callable()
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 295, in <lambda>
lambda: [lookup_callable()],
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/renderers.py", line 73, in <lambda>
lambda: self.render_to_string_uncached(request),
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/renderers.py", line 87, in render_to_string_uncached
request=request)
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/diffutils.py", line 669, in populate_diff_chunks
chunks = list(generator.get_chunks())
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/chunk_generator.py", line 786, in get_chunks
for chunk in super(DiffChunkGenerator, self).get_chunks(cache_key):
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/chunk_generator.py", line 107, in get_chunks
large_data=True)
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 298, in cache_memoize
compress_large_data))
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 252, in cache_memoize_iter
items = items_or_callable()
File "/usr/lib/python2.7/site-packages/djblets/cache/backend.py", line 295, in <lambda>
lambda: [lookup_callable()],
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/chunk_generator.py", line 106, in <lambda>
lambda: list(self.get_chunks_uncached()),
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/chunk_generator.py", line 793, in get_chunks_uncached
new = get_patched_file(old, self.filediff, self.request)
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/diffutils.py", line 233, in get_patched_file
return patch(diff, buffer, filediff.dest_file, request)
File "/usr/lib/python2.7/site-packages/reviewboard/diffviewer/diffutils.py", line 169, in patch
'output': stderr,
Exception: The patch to '//user/yaqian/tmp/utf8files/demo.xml' didn't apply cleanly. The temporary files have been left in '/tmp/reviewboard.kMIK0e' for debugging purposes.
patch returned:

What operating system are you using? What browser?

The Reviewboard server is running on Linux (Centos 7.4);
I'm using Chrome on Windows.

Please provide any additional information below.

  1. The error is from the patch command. I investigated and found it's because rbtool and reviewboard synced different content for the utf8 file, to be specific:
  2. the rbt tool is calling 'p4 print' with the -o option which reserves the BOM characters. https://github.com/reviewboard/rbtools/blob/master/rbtools/clients/perforce.py#L1317
  3. but the reviewboard is calling 'p4 print' without the -o option which removed the BOM. https://github.com/reviewboard/reviewboard/blob/master/reviewboard/scmtools/perforce.py#L556

You can use the following python code to test.

    import P4

    p4 = P4.P4()

    # update the following according to your actual perforce settings
    p4.port = 'perforce:1666'
    p4.user = 'yaqian'
    #p4.password = ''
    depot_file = '//user/yaqian/tmp/utf8files/demo.xml'

    p4.connect()

    print "p4 print -q '%s'" % depot_file
    res = p4.run_print('-q', depot_file)
    first5c = res[-1][0:5]
    print first5c
    print ':'.join(x.encode('hex') for x in first5c)

    print "p4 print -q -o /tmp/test.utf8.out '%s'" % depot_file
    p4.run_print('-q', '-o', '/tmp/test.utf8.out', depot_file)
    first5c = open('/tmp/test.utf8.out', 'rb').read()[0:5]
    print first5c
    print ':'.join(x.encode('hex') for x in first5c)

    p4.disconnect()
  1. This issue is only reproduciable with Perforce 2015.2+ which added the utf8 server file type - see the release notes - https://www.perforce.com/perforce/r15.2/user/relnotes.txt.

Major new functionality in 2015.2
...

998379 * **

UTF8 server file type added. Files detected to hold utf8 textual
contents can be stored as a utf8 file. These files will have
BOMs (Byte order marks) removed when stored on the server
and added back when synced.

  1. To fix the issue, rbtool and reviewboard should have sync behavior
#1 yaqian

I did not find place to update the ticket, so I will just post the update here.

For "To fix the issue, rbtool and reviewboard should have sync behavior", I meant that rbtool and reviewboard should have same sync behavior - maybe all use 'p4 print' with the -o option because it will preserve file type/content/permissions according to https://www.perforce.com/perforce/doc.current/manuals/cmdref/#CmdRef/p4_print.html

I made a change to the get_file function in https://raw.githubusercontent.com/reviewboard/reviewboard/release-2.5.13.1/reviewboard/scmtools/perforce.py to use the p4 print -o option (attached), that seems fixed the issue for my team.

  • +
    """Repository support for Perforce."""
    from __future__ import unicode_literals
    import logging
    import os
    import random
    import re
    import shutil
    import signal
    import socket
    import stat
    import subprocess
    import tempfile
    import time
    from contextlib import contextmanager
    from django.conf import settings
    from django.utils import six
    from django.utils.functional import cached_property
    from django.utils.translation import ugettext_lazy as _
    from djblets.util.filesystem import is_exe_in_path
    from reviewboard.diffviewer.parser import DiffParser
    from reviewboard.scmtools.certs import Certificate
    from reviewboard.scmtools.core import (SCMTool, ChangeSet,
                                           HEAD, PRE_CREATION)
    from reviewboard.scmtools.errors import (SCMError, EmptyChangeSetError,
                                             AuthenticationError,
                                             InvalidRevisionFormatError,
                                             RepositoryNot
#2 timbrown5

We are seeing this issue as well, and it's happening on a lot of our files.
I tried to work around it by setting the P4CHARSET=uft8 (not utf8-bom) but rbt post <changelist> returned 'CRITICAL: Path <changelist> does not match a valid Perforce path.' this issue:
https://groups.google.com/forum/#!topic/reviewboard/U3faiK2uOyg
https://stackoverflow.com/questions/28804908/failed-to-rbt-post-changenum-from-perforce-p4v

Is there a way to make rbtools not uses the -o option (or ignore the BOM) or do I have to wait for rbtools/reviewboard update?

david
#3 david

Perforce tries to be "smart" about BOMs, but it does a really poor job of that. We've found most of these issues can be solved by upgrading the version of p4python on the Review Board server. Can you try that and see if it helps?

  • -New
    +NeedInfo
#4 yaqian

David, upgrading p4python on the RB server did not work. The reason it’s not working is that rbtool and the RB server are calling ‘p4 print’ differently - rbtool (and post-review) is calling p4 print with the -o option - ‘p4 print -o outputfile //depot/file’, see code here https://github.com/reviewboard/rbtools/blob/master/rbtools/clients/perforce.py#L1317 (the print_file function will add the -o option); while the RB server is not – it calls ‘p4 print //depot/file’ directly, see its code here - https://github.com/reviewboard/reviewboard/blob/master/reviewboard/scmtools/perforce.py#L556 (the run_print function does not have the -o option)

According to the p4 print doc - https://www.perforce.com/perforce/doc.current/manuals/cmdref/#CmdRef/p4_print.html, the -o option redirects output to the specified output file on the local disk, preserving the same file type, attributes, and/or permission bits as the original file in the depot. With the -o option, the BOM characters are kept; without it, the BOM characters are removed.

So this is an issue related to the different print behavior in reviewboard/rbtool, triggered by the perforce 2015.2 utf8 feature. The fix is to make post-review and review-board call p4 print in the same way. I attached a fix in my last post, that resolved the issue for my team.

david
#5 david
  • -NeedInfo
    +PendingReview
  • +david
david
#6 david

Fixed in release-2.5.x (c84be1c). This will ship in 2.5.18 ad 3.0.2. Thanks!

  • -PendingReview
    +Fixed