June 2, 2010

Mike Conley
Review Board Tests: SCMTool Segmentation Fault

I can’t honestly say I’ve been doing much test-driven development on Review Board.  Django is a really cool web-framework, but I miss all of the nice testing tools from the Rails ecosystem.

Anyhow, today, I decided to run the tests, and BAM – Segmentation Fault:

Testing Perforce binary diff parsing … SKIP: perforce/p4python is not installed
Testing PerforceTool.get_changeset … SKIP: perforce/p4python is not installed
Testing Perforce empty and normal diff parsing … SKIP: perforce/p4python is not installed
Testing Perforce empty diff parsing … SKIP: perforce/p4python is not installed
Testing PerforceTool.get_file … SKIP: perforce/p4python is not installed
Testing parsing SVN diff with binary file … ok
Testing SVNTool.get_file … Segmentation Fault

Yikes.  Getting a segfault in Python is unusual, and a little jarring.  However, it did let me narrow down the problem a bit:  it must have to do with the pysvn bindings that Review Board uses to talk to Subversion, because those are written in C++ (which can certainly segfault).

The machine I was using had Ubuntu Hardy on it, and the Hardy packages have pysvn 1.5.2-1.  Taking a look at the pysvn homepage, the most recent version is actually 1.7.2.

So, I uninstalled pysvn, downloaded the source for 1.7.2, compiled it, and installed it manually.

Segfault gone.  Tests pass.  Awesome sauce.

May 12, 2010

Mike Conley
Python Eggs: Sunny Side Up, and Other Goodies (or How I Learned to Stop Worrying and Start Coding)

Cooking with Eggs

Every now and then, the computer gods smile and give me a freebie.

I’ve been worrying my mind out over a few problems / obstacles for my Review Board extensions GSoC project.  In particular, I’ve been worrying about dealing with extension dependencies, conflicts, and installation.

I racked my brain.  I came up with scenarios.  I drew lots of big scary diagrams on a wipe board.

And then light dawned.

Batteries Come Included

Enter Setuptools and Python Eggs.

All of those things I was worried about having to build and account for?  When using Python Eggs, It’s all built in. Dependencies?  Taken care of. Conflicts?  Don’t worry about it.  Installation?  That’s what Setuptools and Python Eggs were built for!

In fact, it even looks like Setuptools was designed with extensible applications in mind.

Wait, really?  How?

Here’s the setup.py file for the rb-reports extension in the rb-extensions-pack on Github:

from setuptools import setup, find_packages

PACKAGE="RB-Reports"
VERSION="0.1"

setup(
    name=PACKAGE,
    version=VERSION,
    description="""Reports extension for Review Board""",
    author="Christian Hammond",
    packages=["rbreports"],
    entry_points={
        'reviewboard.extensions':
        '%s = rbreports.extension:ReportsExtension' % PACKAGE,
    },
    package_data={
        'rbreports': [
            'htdocs/css/*.css',
            'htdocs/js/*.js',
            'templates/rbreports/*.html',
            'templates/rbreports/*.txt',
        ],
    }
)

Pay particular attention to the “entry_points” parameter.  What this is doing, is registering rbreports.extension:ReportsExtension to the entry point “reviewboard.extensions”.

“Hold up!”, I hear you asking. “What’s an entry point?”

Entry Points

An entry point is a unique identifier associated with an application that can accept extensions.

The unique identifier for Review Board extensions is “reviewboard.extensions”.

This is the first handshake, more or less, between Review Board and any extensions:  in order for Review Board to “see” the extension, the extension must register an entry point at “reviewboard.extensions”.

This blog post shows how extensions can be found and loaded up.

Other Goodies

INSTALLED_APPS and Django

I remember also being worried about how to create tables in Django for extension models.  I thought “holy smokes, I’m going to have to either shoehorn some raw SQL into the extension manager, or maybe even trust the extension developers to write the CREATE TABLE queries themselves!”.

Luckily, there’s a better alternative.

Django knows about its applications through a dictionary called INSTALLED_APPS. When you add a new model to a Django project, you simply add the model app to the INSTALLED_APPS dictionary, and run “manage.py syncdb”.  Django does the magic, bingo-bango, and boom – tables created.

So if a new extension has some tables it needs created, I simply insert the app name of the extension into INSTALLED_APPS when the extension is installed, and call syncdb programmatically.  Tables created:  no sweat.

django-evolution

Creating tables is easy.  But what if an extension gets updated, and the table needs to be modified?  Sounds like we’ve got a mess on our hands.

And don’t expect Django to save you.  When you modify a model in Django, they expect you to into that DB and alter that table by hand:

[syncdb] creates the tables if they don’t yet exist. Note that syncdb does not sync changes in models or deletions of models; if you make a change to a model or delete a model, and you want to update the database, syncdb will not handle that.
From The Django BookChapter 5: Models

Thankfully, there’s a mechanism that’s already built into Review Board that makes this trouble go away:  django-evolution.  Django-evolution, when used properly, will automatically detect changes in application models, and alter the database tables accordingly.  This is how Review Board does upgrades.

And to top that off, RB co-founder Christian Hammond just became the django-evolution maintainer.

Wow.  Everything is falling neatly into place.

May 5, 2010

Mike Conley
Python Metaclasses in Review Board

So, after diving into the Review Board extension code, I hit a little snag.

It turns out I never learned about Python metaclasses.  And the extension code in Djblets / Review Board uses them.  In the map I made of the Review Board extension code, I represented my confusion with an image of some kind of quantum-divide-by-zero implosion.

I’ll get to the why for metaclasses in a second.  First, I’ll demonstrate how it’s currently being used in the code.

The Way Things Are

This snippit is from the Djblets library, in the extensions folder, in base.py:

...

class ExtensionHook(object):
    def __init__(self, extension):
        self.extension = extension
        self.extension.hooks.add(self)
        self.__class__.add_hook(self)

    def shutdown(self):
        self.__class__.remove_hook(self)

class ExtensionHookPoint(type):
    def __init__(cls, name, bases, attrs):
        super(ExtensionHookPoint, cls).__init__(name, bases, attrs)

        if not hasattr(cls, "hooks"):
            cls.hooks = []

    def add_hook(cls, hook):
        cls.hooks.append(hook)

    def remove_hook(cls, hook):
        cls.hooks.remove(hook)

...

And here are those classes being used in the Review Board extension directory, where it defines its hooks:

...

class DashboardHook(ExtensionHook):
    __metaclass__ = ExtensionHookPoint

    def get_entries(self):
        raise NotImplemented

class NavigationBarHook(ExtensionHook):
    """
    A hook for adding entries to the main navigation bar.
    """
    __metaclass__ = ExtensionHookPoint

    def get_entry(self):
        """
        Returns the entry to add to the navigation bar.

        This should be a dict with the following keys:

            * `label`: The label to display
            * `url`:   The URL to point to.
        """
        raise NotImplemented
...

The idea here is that, if someone wanted to write an extension, they could add a hook to the navigation bar by subclassing the hooks, like so:

(from the rbreports prototype extension)

...

class ReportsDashboardHook(DashboardHook):
    def get_entries(self):
        return [{
            'label': 'Reports',
            'url': settings.SITE_ROOT + 'reports/',
        }]

class ReportsExtension(Extension):
    is_configurable = True

    def __init__(self):
        Extension.__init__(self)

        self.url_hook = URLHook(self, patterns('',
            (r'^reports/', include('rbreports.urls'))))

        self.dashboard_hook = ReportsDashboardHook(self)
...

So you can see the __metaclass__ thing up there in the DashboardHookDashboardHook subclasses ExtensionHook, and has ExtensionHookPoint as a metaclass.

Why? And what is a metaclass anyways?  Why is this useful at all?

Well, luckily, I think I’ve figured it out.

Behold – Metaclasses

Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).
– Python Guru Tim Peters

There are plenty of resources available regarding Python’s metaclasses.  And I ended up reading a ton of them trying to figure out just what the hell metaclasses actually do.

As it turns out, the best explanation I got was from Wikipedia:

In object-oriented programming, a metaclass is a class whose instances are classes. Just as an ordinary class defines the behavior of certain objects, a metaclass defines the behavior of certain classes and their instances.

In particular, check out their example on Cars.  That, for me,  more or less set the record straight on how metaclasses are used.

But why does Review Board use them when defining those hooks, like DashboardHook?

Why?

Forget the metaclasses for just a second.

Something is missing from that code that I posted up.

I’ll pose it as a question:  How exactly is Review Board supposed to know about ReportsDashboardHook?  Like…if we’re rendering the Dashboard, and want to fire off calls to all of our DashboardHooks…how do we do it?  How can we find all of the active extension subclasses for DashboardHook?

One way you could do it, is to dive into the extensions directories, and use regular expressions to find defined classes.  That sounds like a lot of work, brittle, and prone to error.  What else?

Next, you could go back to those extension files, “eval” them (shudder), and extract the globals(), looking for ExtensionHook subclasses.  That also sounds like lots of work, brittle, and prone to error.

You could go for class.__subclasses__…but this gives you every subclass, and we just want the hooks that are for active extensions.  We could filter out all of the ones that aren’t activated, but we’d have to do that for every hook, and that blows.

Next, we could have developers add their hooks to a registry after defining them.  So, we could have:

class ReportsDashboardHook(ExtensionHook):
  # ...
  # code goes here
  # ...

dashboard_hook_list.append(ReportsDashboardHook)

And then when an extension is shut down, we just remove it from the global hook list.  That doesn’t look so bad, does it?  The only problem, is now we have to trust that extension developers will know to add those hooks to the global_hook_list?  It’s another step, another thing to forget, and another point of failure.

And it seems silly – I mean, why go through all this effort?  Review Board has already seen these hooks…why should it be so hard to just access the list activated hooks easily?

And there it is. That’s why I think we’re using metaclasses.

If you go back to ExtensionHook and ExtensionHookPoint, you’ll notice that ExtensionHookPoint has a list as an instance variable called hooks.  That’s the global hook list for that class of hook (DashboardHook).  And then ExtensionHook, in its constructor, automatically adds itself to the Extension’s internal list of hooks, as well as the global hook list, with the add_hook method.  Shutting down the hook removes that hook from the global hook list.

So now, all we need to do to add our implemented hook to the list of hooks, is to simply subclass DashboardHook, or NavigationBarHook, or any of those Review Board hooks.  No need to add the hook to the global list manually – Djblets / Review Board will take care of it.  So now, we can get all of the DashboardHook subclasses for activated hooks, easily and consistently, with no extra work for the extension developer.  Same with the NavigationBarHook subclasses.

Smooth as glass.  Nice.

Anyhow, that’s my understanding of it.  If I’m totally off, I’ll let you know when I find out.

May 4, 2010

Christian Hammond
Looking Back on Review Board

Just over 3.5 years ago, David Trowbridge and I spent some time discussing the annoyances of the typical patch submission and code review processes in the open source projects we participated in and at companies, and decided to play with some ideas for improving this. At the time, we knew very little about what we intended to do. We had a name for it pretty early on, but that was about all we had. We didn’t even know whether we’d get past an early prototyping stage. But here it is, over 3 years later, and we have the leading open source code review tool with an active support and development community, hundreds of companies using it, and exciting new innovations for aiding in the code review process.

I was thinking a few days ago about how far we’ve come and some of the decisions we made along the way. I went digging through our commit history in order to relive some of the past of our little project. Since so few people were even aware of Review Board’s existence at the time, I thought I’d share some of our history with you. Particularly the interesting and funny bits.

“Add the reviewboard.”

Commit #1. The very first thing we put in our Subversion tree on September 27, 2006. I don’t even remember what was in this change now. We transitioned to Git last year and this commit is now just plain empty. Maybe it was jut the directory structure? Who can say.

Early on, we didn’t refer to “Review Board” as a proper name. It was generally “the reviewboard” or something similar. The codebase was young. We didn’t actually do code review on the project at this point (and it shows!). The first few months are littered with odd or nonsensical commit messages, small breakages, and bad decisions.

A few of my favorite commit messages are:

  • “I suck. Make submitting of reviews.”
  • “Don’t stuff the list of files in the bug list. It’s impolite.”
  • “Avoid failing out with Christian’s wacko form”
  • “Gum.”
  • “Holy apple pancakes. It worked!”
  • “I suck… The array was empty… The tests never had a chance to fail. :(”
  • “‘This is a summary’ sucks. Now we use fortune for the summary, description, and testing done. ‘You’re ugly and your mother dresses you funny.’”
  • “Unbreak things before ChipX86 notices”
  • “I’m just… garhgh”

Nowadays, our commit messages look nothing like that, but that’s the fun of a new project. You get to go commit-crazy while you try to figure out what you’re building.

Dashboard, quips and fortunes

The UI of old looked quite different than the UI of today.

We had a dashboard from the very beginning (before the review request pages, even) but it wasn’t anything like the dashboard we had today. It was a simple page with a table containing all outgoing review requests and a table containing incoming review requests. But it also had one more thing: quips.

The beginning of quips functionality was being built. Quips are just little random quotes that are inserted in the UI. I think the plan was to put quips on certain pages, making Review Board a little more fun. We were using them in the dashboard for empty lists, with variations all saying something about the dashboard being empty. Quips are a neat feature that just never survived the early days of development.

Fortunes are similar. On Linux/Unix systems, there’s a little program called “fortune” that just displays a random quote. Since we at first had to test review request functionality without actually having a repository backend of any sort, and we didn’t want to input all the information each time, we just used fortune to generate the summary, description and testing done text. This made for some really funny review requests early on, but this is of course something that had no reason to survive initial development.

Sometimes we would create a bunch of review requests just to see what kind of quotes we’d get. :)

Multiple repositories? Almost didn’t happen.

One of the really critical parts of Review Board today is the ability to talk to a variety of different types of repositories in one instance. But, it turns out, this almost didn’t happen.

The initial goals were not that ambitious. Review Board talked to one repository per instance. Everything was basically hard-coded with one repository in mind. That type of repository, as well as its information, was customizable. You just couldn’t have more than one. At the time, this wasn’t a problem, but it didn’t take long until we had a need to talk to two repositories.

We discussed this and at first decided that if we needed to talk to two repositories, we could just set up two instances. It would have been a lot of work to update it for multiple repositories, after all. And really, this was a small project. Who would really need more than a couple repositories? This started to nag at me, though, and so I spent a couple nights rewriting all of the code as an experiment. It ended up working pretty nicely, and we were able to ditch the multi-instance model.

The importance of rewards

It’s always nice to have a little reward for milestones. Developers sometimes compete over cool bug numbers, revisions, etc. Initially, we were going to use quips to add some fun to the site, but we ended up settling on our current trophy system.

One of our first Review Board instances started to approach review request #1000, which was a huge milestone for us. I decided to commemorate the event by staying up and quickly hacking in a hidden feature for showing a trophy for review request #1000. The way we implemented it, you’d see the first ever trophy at 1,000, and from there you’d see it at every milestone number (1,000, 2,000, 3,000, 10,000, etc.). I didn’t want to stop there, though, so I added support for a second type of trophy, one that has confused people with its appearance to this day. Mission complete.

Of course, when we updated the server and someone finally hit 1000, it triggered a bug in the new trophy code and broke his review request. Oh well, I tried.

Diff viewers are hard

If I could pick one point during the whole history of Review Board where I was ready to completely give up, it would be during the creation of our diff viewer. All three diff viewers.

See, the first diff viewer was a complete and total hack. We generated a side-by-side diff using the diff tools and just parsed the output, basically generating a table of that. It was ugly, though, and limiting. It also caused problems where text on a row would either be truncated or would break the parser. I spent a long time working on this before I totally gave up and went on to try a new approach.

My second approach was closer to what we have today, but also limiting and very, very buggy. We were using Python’s built-in diff generation module, which implements a basic diff algorithm. It gave us insert and delete information, but not replace information. We had to hack that in ourselves, and it was really a hack. Try taking a bunch of inserts and deletes and find out which of those are really changed lines. No, really, try it. It’s harder than you think, and it’ll often be wrong.

Still, we stuck with this for a long time. It was slow, buggy, and didn’t generate the sort of output people expected from diff tools. Most people see diffs from GNU Diff, which implements the Meyers Diff Algorithm (with a few additions and tweaks). These Meyers diffs are much nicer to view than what Python gave us. Another problem we hit was that we didn’t have real line number information, so we had to output fake line numbers. They weren’t really line numbers so much as row numbers in the table. Ugh. Even getting this far was really hard and frustrating, and the result still wasn’t good.

Attempt #3. I decided to build our own diff parser and generator from scratch. What a project. I knew nothing about diff generation and hardly knew where to start. I spent probably a good month or so just trying to work on this new diff code, and was so close to giving up so many times. It ended up being completely worth it, though, as we ended up with a very nice, extensible diff parser.

Without that third attempt, we’d be in the stone age. Review Board would not be as nice to use. We wouldn’t have inter-line diffs (where we highlight what changed in a replace line), syntax highlighting, move detection (coming in 1.5), or function/class headers (where we show which function/class the part of the diff is in — also coming in 1.5).

What else…

Well, there’s a lot more I could talk about. Our initial attempts at JavaScript code for the UI, our trials and challenges with database migration, or our early problems storing diffs with different encodings in databases. This is getting long, though, so I’ll cover these in another post on lessons learned.

May 3, 2010

Mike Conley
The Review Board Extension Life-Cycle

According to the timeline, I’m still in the community-bonding period for GSoC.  Coding for my project is supposed to start sometime towards the end of May.

So I’m using the time to do the following:

  • Close as many small, easy tickets as I can for the  upcoming Review Board release.  I’ve already posted some patches for review.  More forthcoming.
  • Get to know the tools I’ll be using.  Review Board is hosted on Github, and I’m relatively new to the whole DVCS thing.  I’ve been figuring out how to use Git, how to post patches, merging, branching, etc.
  • Get to know the area I’ll be working in.  I’ve been figuring out how Django apps organize themselves.  I’ve also drawn up a map of the current state of the extension framework to help me visualize it.
  • Get to know the other developers working on Review Board.  I’ve been hanging out in the #reviewboard-soc FreeNode IRC channel.  Very nice, and helpful people to work with.
  • Develop a plan of attack for my project

And this last point is the one I want to talk about.

The Review Board Extension Life-Cycle

An extension isn’t just some isolated piece of code that gets crammed into an application.  When you’ve got multiple extensions already installed and running, installing and activating a new extension is like introducing a new animal into an ecosystem.  You have to make sure that your new animal plays nice with the others, and that, in the morning, there won’t be a pile of rotting corpses where your application used to be.

I may have gotten carried away with my metaphor.

So let’s look at what I’m envisioning as the life-cycle for a Review Board extension.  I’ll start right from the top.

Getting the Extension

Ideally, this will work as nicely as WordPress’s implementation:  a Review Board administrator is given a catalog of extensions to choose from within the Administrator interface, and one click later, the desired extensions are downloaded and ready to be installed.

For all you system administrators out there, that last idea might make your toes curl.  A Review Board administrator is not necessarily a system administrator, and the system administrator knows what he/she likes on their machine.  An application that can go and download other applications can be dangerous.  We have to ensure that the application that we’re downloading is the one we’re trying to download.  The last thing we need is some man-in-the-middle to do something cute and bork the code review machine.  And we want to ensure that the extension functions as advertised.  No hidden features.  No self-destruct mechanisms.  No back doors.

Do I have a plan for this part?  Well….no, not really.  I don’t imagine I’ll get that far – I consider it a little out of my scope.  So, for my project, I think it’ll satisfy if the system admin (or Review Board admin) can manually download the extension, decompress it, and place it where Review Board can work with it.  That other stuff can come later (and I’ll try to design so that it can come later easily).

Installing the Extension

Ok, so at this point, we’ve got our extension downloaded in a place where Review Board can see it.

So now what?

Now we need to install the extension.  To me, that means letting the extension put its roots into the RB install by creating database tables, preparing initial data, and generally doing everything to make conditions suitable for the extension to function.

When an extension install begins, I imagine it will consider the following questions (in no particular order):

  1. Do I (the extension) depend on other extensions to function?  If so, are those extensions present?  If not, let the user know so that they can go get them.
  2. Are there some extensions already installed that will conflict with me, or make me behave badly?  If so, let the user know so that they can either remove that conflict, or find an alternative extension.
  3. Is the user entirely aware of what I can do?  Make sure that my capabilities, limitations, and behavioural quirks are known to the user.

Once those 3 questions are answered, and everything is looking good for the install, the extension will create the database tables it needs (if any) in order to function.  If anything goes wrong during this process, the database changes will be rolled back, and the user will get a full read out about what went wrong.

If nothing goes wrong, the extension will be installed.  The user might then be asked to set some initial operating parameters for the extension.

Ok great – the extension is installed.  Now what?

Activating the Extension

For something that sounds so dramatic, the explanation about what happens is pretty short:  Review Board simply becomes aware that the extension is activated, and passes data through the necessary hooks in order for the extension to function properly.  Upon activation, the extension should do a quick double-check to ensure that all prerequisites for the extension have been met.  This is because we can’t trust users to activate an extension immediately after downloading them.

The Extension Runs

While it’s activated, the extension will probably react to various events that happen on Review Board.  Tables will be updated.  View methods will be run.  Templates will be rendered.

If anything, ever, goes horribly wrong with an extension, the following will happen:

  1. A log entry will be written, dumping the error, and information about what the user was trying to do
  2. An error message will be displayed to the user, trying to tell them what exactly happened
  3. The extension (and its dependents) will be deactivated.  They will no longer react to events on Review Board.  Their tables will still be there, the settings will still, but the extension will be, in essence, asleep.  Dormant.  Non-reactive.

The administrator could try to reactivate the extension at this point.  They might try to contact the extension developer for support.

The Extension is Un-installed

If the user wants to rid themselves of an extension, they must first deactivate it.  This will put it (and its dependents) in the dormant state.  It just switches them off, nothing else.

Deactivated extensions can then be un-installed.  If the user chooses to un-install the extension, the extension database tables and settings will be wiped out.

The extension itself won’t be deleted though – at least, not within the scope of my project.  The extension files will need to be removed from Review Board manually.

At this point, any dependents that this uninstalled extension had will no longer be able to be activated.

Anyhow, that’s how I envision the life-cycle.  It’s my first go at it, so I’d love to hear some feedback if you have any.

1 2 3 4 5 6 7 8 pages