Via Twitter, I found this interesting gist (https://gist.github.com/2140115) which contains a one-line bash command for “Find[ing] the most used verbs in your git commit messages” in a git repo.
$ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c | sort -nr 2 Added 1 Improved
In the following, I have extracted the neccessary information from the help/man pages to understand, how this is achieved. I also preserved the intermediate states which are printed after a command is explained.
git log # "Show commit logs"(git log --help) git log --pretty=format:'%s' # print each commit log as a line containing solely its subject $ git log --pretty=format:'%s' Improved README Added CHANGELOG Added README cut # "print selected parts of lines from each FILE to standard output."(cut --help) cut -d, --delimiter=DELIM # "use DELIM instead of TAB for field delimiter"(cut --help) cut -f, --fields=LIST # "output only these fields"(cut --help) cut -d " " -f 1 # split line by " " and output only the first field $ git log --pretty=format:'%s' | cut -d " " -f 1 Improved Added Added sort # "Write sorted concatenation of all FILE(s) to standard output."(sort --help) $ git log --pretty=format:'%s' | cut -d " " -f 1 | sort Added Added Improved uniq # "Discard all but one of successive identical lines from INPUT (or standard input), wirting to OUTPUT (or standard output)"(uniq --help) uniq -c, --count # "prefix lines by the number of occurrences"(uniq --help) $ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c 2 Added 1 Improved sort # "Write sorted concatenation of all FILE(s) to standard output."(sort --help) sort -r # "reverse the result of comparisons"(sort --help) sort -n # "compare according to string numerical value, imply -b"(sort --help) sort -b # "ignore leading blanks in sort fields or keys"(sort --help) $ git log --pretty=format:'%s' | cut -d " " -f 1 | sort | uniq -c | sort -nr 2 Added 1 Improved
If you want all the words (and not only the first words of each commit), the cut
command does not suffice. In that case, sed
, a “stream editor for filtering and transforming text”(sed man page) can be leveraged. It is shown in the following.
$ git log --pretty=format:'%s' Improved README Added CHANGELOG Added README sed # "stream editor for filtering and transforming text"(sed man page) sed s/regexp/replacement/ # "Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement."(sed --help) \s # whitespace character \n # new line sed 's/\s/\n/g' # replace all whitespace characters by new lines $ git log --pretty=format:'%s' | sed 's/\s/\n/g' Improved README Added CHANGELOG Added README $ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort Added Added CHANGELOG Improved README README $ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c 2 Added 1 CHANGELOG 1 Improved 2 README $ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c | sort -nr 2 README 2 Added 1 Improved 1 CHANGELOG
But beware, as this can be too much for your command line buffer to handle. This can be solved by piping the result to less
.
$ git log --pretty=format:'%s' | sed 's/\s/\n/g' | sort | uniq -c | sort -nr | less 2 README 2 Added 1 Improved 1 CHANGELOG
It can still be improved, e.g., to convert all upper case letters to lower case ones. This is possible using tr '[:upper:]' '[:lower:]'
which uses the tr
tool that is able to “translate, squeeze, and/or delete characters from standard input, writing to standard output”(tr --help
). In that case it translates every upper case letter to a lower case one while copying the other characters.