svn blame output format

>svn blame http://core.svn.wordpress.org/trunk/license.txt@HEAD
  7131       ryan                   GNU GENERAL PUBLIC LICENSE
  7131       ryan                      Version 2, June 1991
  7131       ryan
 10085       matt  Copyright (C) 1989, 1991 Free Software Foundation, Inc.
 15668     scribu  51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
 10085       matt
  7131       ryan  Everyone is permitted to copy and distribute verbatim copies
  7131       ryan  of this license document, but changing it is not allowed.
  7131       ryan

So, Most of have seen that, and a few of us have had a need to parse it now and then.. Googling for an answer never seems to bring up a reasonable answer either.. So here you have it, an explanation of the output format of SVN Blame.

Honestly, It’s quite simple, it’s just fixed width columns, with a twist, by default the columns are 6 characters wide, but will expand to fit the largest revision ID, or the longest committers username. The author’s field is followed by a single space, and then, the line data.

Why go to the trouble of this? Well, It comes back to the fact that although it looked like fixed width.. it couldn’t just be fixed width.. and what rules are behind it? I checked the source of svn blame and this comment was what I wanted to know how the fixed width was selected:

  /* The standard column width for the revision number is 6 characters.
     If the revision number can potentially be larger (i.e. if the end_revnum
     is larger than 1000000), we increase the column width as needed. */

Updated Regular expression: (Original had a bug where it didn’t handle empty lines in the file)

preg_match_all('!^\s*(?P<revision>\d+)\s+(?P<author>.+?)( (?P<data>.*))?$!m',
 $output_from_svn_blame, $matches, PREG_SET_ORDER);

Example output:

array
 0 =>
 array
 'revision' => string '7131' (length=4)
 'author' => string 'ryan' (length=4)
 'data' => string ' Version 2, June 1991' (length=41)
 1 =>
 array
 'revision' => string '7131' (length=4)
 'author' => string 'ryan' (length=4)
 'data' => string '' (length=0)
 2 =>
 array
 'revision' => string '10085' (length=4)
 'author' => string 'matt' (length=4)
 'data' => string ' Copyright (C) 1989, 1991 Free Software Foundation, Inc.' (length=56)

It’s probably not the most efficient, but it does the job for what I needed for now. Also note, I’ve removed the numeric keys from the example output there (preg_match returns both named and numeric results in the array set)



2 thoughts on “svn blame output format”

  1. I’m surprised no-one picked up on the bug. Empty data lines are not handled by that regex, leading to the 2nd array output in that example including an entire svn blame line in the data field..

Leave a Reply

Your email address will not be published. Required fields are marked *