Find the most used first word (and every word) in your git commit messages

Via Twitter, I found this interesting gist (https://gist.github.com/2140115) which contains a one-line bash command for “Find[ing] the most used verbs in your git commit messages” in a git repo.

$ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c | sort -nr
      2 Added
      1 Improved

In the following, I have extracted the neccessary information from the help/man pages to understand, how this is achieved. I also preserved the intermediate states which are printed after a command is explained.

git log # "Show commit logs"(git log --help)
git log --pretty=format:'%s' # print each commit log as a line containing solely its subject

$ git log --pretty=format:'%s'
Improved README
Added CHANGELOG
Added README

cut # "print selected parts of lines from each FILE to standard output."(cut --help)
cut -d, --delimiter=DELIM # "use DELIM instead of TAB for field delimiter"(cut --help)
cut -f, --fields=LIST # "output only these fields"(cut --help)
cut -d " " -f 1 # split line by " " and output only the first field

$ git log --pretty=format:'%s' | cut -d " " -f 1
Improved
Added
Added

sort # "Write sorted concatenation of all FILE(s) to standard output."(sort --help)

$ git log --pretty=format:'%s' | cut -d " " -f 1 | sort
Added
Added
Improved

uniq # "Discard all but one of successive identical lines from INPUT (or standard input), wirting to OUTPUT (or standard output)"(uniq --help)
uniq -c, --count # "prefix lines by the number of occurrences"(uniq --help)

$ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c
      2 Added
      1 Improved

sort # "Write sorted concatenation of all FILE(s) to standard output."(sort --help)
sort -r # "reverse the result of comparisons"(sort --help)
sort -n # "compare according to string numerical value, imply -b"(sort --help)
sort -b # "ignore leading blanks in sort fields or keys"(sort --help)

$ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c | sort -nr
      2 Added
      1 Improved

If you want all the words (and not only the first words of each commit), the cut command does not suffice. In that case, sed, a “stream editor for filtering and transforming text”(sed man page) can be leveraged. It is shown in the following.

$ git log --pretty=format:'%s'
Improved README
Added CHANGELOG
Added README

sed # "stream editor for filtering and transforming text"(sed man page)
sed s/regexp/replacement/ # "Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement."(sed --help)
\s # whitespace character
\n # new line
sed 's/\s/\n/g' # replace all whitespace characters by new lines

$ git log --pretty=format:'%s' | sed 's/\s/\n/g'
Improved
README
Added
CHANGELOG
Added
README

$ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort
Added
Added
CHANGELOG
Improved
README
README

$ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c
      2 Added
      1 CHANGELOG
      1 Improved
      2 README

$ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c | sort -nr
      2 README
      2 Added
      1 Improved
      1 CHANGELOG

But beware, as this can be too much for your command line buffer to handle. This can be solved by piping the result to less.

$ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c | sort -nr | less
      2 README
      2 Added
      1 Improved
      1 CHANGELOG

It can still be improved, e.g., to convert all upper case letters to lower case ones. This is possible using tr '[:upper:]' '[:lower:]' which uses the tr tool that is able to “translate, squeeze, and/or delete characters from standard input, writing to standard output”(tr --help). In that case it translates every upper case letter to a lower case one while copying the other characters.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s