>svn blame http://core.svn.wordpress.org/trunk/license.txt@HEAD 7131 ryan GNU GENERAL PUBLIC LICENSE 7131 ryan Version 2, June 1991 7131 ryan 10085 matt Copyright (C) 1989, 1991 Free Software Foundation, Inc. 15668 scribu 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA 10085 matt 7131 ryan Everyone is permitted to copy and distribute verbatim copies 7131 ryan of this license document, but changing it is not allowed. 7131 ryan
So, Most of have seen that, and a few of us have had a need to parse it now and then.. Googling for an answer never seems to bring up a reasonable answer either.. So here you have it, an explanation of the output format of SVN Blame.
Honestly, It’s quite simple, it’s just fixed width columns, with a twist, by default the columns are 6 characters wide, but will expand to fit the largest revision ID, or the longest committers username. The author’s field is followed by a single space, and then, the line data.
Why go to the trouble of this? Well, It comes back to the fact that although it looked like fixed width.. it couldn’t just be fixed width.. and what rules are behind it? I checked the source of svn blame and this comment was what I wanted to know how the fixed width was selected:
/* The standard column width for the revision number is 6 characters. If the revision number can potentially be larger (i.e. if the end_revnum is larger than 1000000), we increase the column width as needed. */
Updated Regular expression: (Original had a bug where it didn’t handle empty lines in the file)
preg_match_all('!^\s*(?P<revision>\d+)\s+(?P<author>.+?)( (?P<data>.*))?$!m', Â $output_from_svn_blame,Â $matches,Â PREG_SET_ORDER);
array 0 => array 'revision' => string '7131' (length=4) 'author' => string 'ryan' (length=4) 'data' => string ' Version 2, June 1991' (length=41) 1 => array 'revision' => string '7131' (length=4) 'author' => string 'ryan' (length=4) 'data' => string '' (length=0) 2 => array 'revision' => string '10085' (length=4) 'author' => string 'matt' (length=4) 'data' => string ' Copyright (C) 1989, 1991 Free Software Foundation, Inc.' (length=56)
It’s probably not the most efficient, but it does the job for what I needed for now. Also note, I’ve removed the numeric keys from the example output there (preg_match returns both named and numeric results in the array set)