Analyzing Your Git Repository

Sometimes it is interesting to see who has commited how often in a git repository, how many lines of code the person contributed, etc.

There are several statistical tools available like gitstats, gitstat or online services like ohloh. However, I didn’t wanted to install additional software. So I looked at the available git commands.

You can use git shortlog to get a list of authors/commits by applying the –summary flag while the –numbered flag sorts them according the number of commits in descending order.

git shortlog --summary --numbered
git shortlog -sn # short form

However, it counts the merge commits, too. These commits do not create value and are unneccessary as one could use git rebase instead aiming for a cleaner git history. It is possible to exclude the merge commits by adding the –no-merges option. This is not included in the man page of git shortlog. This is possible as the git shortlog command is based on the git log command which can interpret the –no-merges option as stated on its man page.

git shortlog --summary --numbered --no-merges
git shortlog -sn --no-merges # short form (there is no one letter flat for --no-merges)

A problem can occur if developers with the same name have different email addresses within your git commit history. Using the command above, they are grouped according the name. Thus, you cannot differentiate between these two persons and their individual commits. For this to work, add the option –emails to ensure that commits of developers with the same name are not aggregated.

git shortlog --summary --numbered --emails
git shortlog -sne # short form

Another problem can occure if a developer uses different names and or email addresses within your git commit history. This can only be solved by adding a mapping file stating which developer has which names and email addresses. The file has to be named .mailmap and located at the top level of the repository. In each line, a mapping is defined. Each mapping maps a commit name and or a commit email address to a proper name and or proper email address. If a developer uses several different commit names and or email addresses, you may need several mappings for this developer.

For example, the developer Max Mustermann uses the following name/email pairs for his commits:

Max Mustermann <max.mustermann@mail.com>
Max <max.mustermann@mail.com>
Max Mustermann <max@mail.com>
Max <max@mail.com>

The aim is to identify this user by Max Mustermann <max.mustermann@mail.com> only. Therefore, the .mailmap file has to look as follows:

# same name but different mail address
Max Mustermann <max.mustermann@mail.com> <max@mail.com>

# same mail address but different names
Max Mustermann <max.mustermann@mail.com> Max

# different name and different email address
Max Mustermann <max.mustermann@mail.com> Max <max@mail.com>

The example shows all possible combinations (same email, same name, different email and name) and can be used as a guiding example for building your very own .mailmap file. For more details on the structure of such a file, please refer to the man page.

This approach only displays the number of commits per developer, however, it does not take the changes (lines added/lines deleted) into account. I will investigate and implement this in another blog post.

Advertisements

One thought on “Analyzing Your Git Repository

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s