Have you ever had to understand a codebase where the lack of documentation reached all levels? From the absolute absence of a wiki to completely cryptic file names or meaningless code change commit messages, the only saving grace being Git version control. This happened to me recently and I had to really deep dive using Git in order to understand how the system works, let me walk you through that investigation, it’ll be fun, I promise.


In this blog post, we are not reviewing good practices for documenting projects up front, but instead, deep diving into the git tools we can use in situations where we need to deeply investigate into the history of a project, resolve the mysteries within it and ultimately regain control over it. None of the tools by themselves will solve an enigma, but hopefully, each tool will point you in the right direction. Using these tools with intelligence and perseverance will help you to resolve any puzzle. If you also take the appropriate actions upon new discoveries, especially documenting as you go, then you and everyone else on your team will have absolute control over the system.

Now, let's have a look at the set of technical tools I recommend with this post.

git shortlog

When fighting an unknown system, the first thing you want to do is look for the technical experts who are still around you, in the team or in the community. Therefore you should understand who are the top contributors to the system. Hopefully, they will have the answers you need.

git shortlog -e -n -s will print the list of contributors in the whole repository ordered by number of contributions.

git shortlog -e -n -s [path] will print the list of contributors in the specified [path] ordered by number of contributions

For example:

> git shortlog -n -s -e 
164 Willy Wonka <willy.wonka@fakedomain.com>
59  Indiana Jones <indy@fakedomain.com>

git effort

At this point, you still have no clue about what the system does, and everything you see is a bunch of seemingly random files. Perhaps they are not so random, so let's figure out which file is the one where the old team put the greatest effort in.

git effort will print the number of commits that affected each file in the repository.
git effort --above number will print the files affected for a number of commits greater than number

For example:

> git effort --above 50
path                                       commits        active days
path/to/file/verychangedfile.cfg           77             33
path/to/file/anotherverychangedfile.cfg    67             32

Note: git effort is part of the git-extras package, therefore you might not have it out of the box. Sure it is available for your OS.

git blame

git blame is useful for those situations where you have already identified an interesting file, and you want to know who made changes on that file, and more importantly, what changes.

Using git blame will output the file content alongside with author for a particular line, and commit hash for the same line.

For example:

> git blame interestingfile.cfg
225c29ba (CATS 2018-09-27 03:34:38 -0500  1) all your
225c29ba (CATS 2018-09-27 03:34:38 -0500  2) base
225c29ba (CATS 2018-09-27 03:34:38 -0500  3) are
225c29ba (CATS 2018-09-27 03:34:38 -0500  4) belong
225c29ba (CATS 2018-09-27 03:34:38 -0500  5) to us

git grep

You might not know yet how the system works, but you are following the trail of a concept or idea and want to search across multiple files that could potentially be related to that. git grep regexp will print the lines in all the tracked files where the regexp is present.

For example, you know that the word username is or was written somewhere inside the system, but you don't know where:

> git grep username
/very/hidden/path/in/system/crypticfilename.txt:	username=anonymous

git log

This is the most powerful git tool to understand the history of the system. git log will output, ordered by date, the history of commits in the repository alongside with the author, and the commit message. All of this information is great but sometimes you need more, let's look at some options to make our search more granular.

git log -G

This command works on a very similar fashion to git grep so you might want to use it on a similar situation. The difference is that git log -G regexp will output the commits where regexp is present on the diff associated to the commit.

For example, continuing with the research about where username could be present in the system, let's see the commits are related to it:

git log -G username
commit 9082a4abd6497e4dee348d5ccd74c472167b4955
Author: Agent -007 <agent@noob.com>
Date:   Tue Jul 18 12:36:02 2017 -0700

    Add secure details to file

commit 71669696c5570f6e627f4ccc5722647e8ff14514
Author: Agent 007 <agent@not-that-noob.com>
Date:   Mon Jul 10 12:35:27 2017 -0700

    Remove username and password from file

Note: Sorry for the offtopic, but this is important.
Obviously Mr. Agent -007 did a huge mistake by committing sensitive information into the repository, but his successor Mr. Agent 007 didn't make it better. What the last one should have done, is to remove the particular change completely from the repository history.

git log --follow -- filename

Aha! you did eventually find an apparently relevant file in your system and you want to understand the changes applied to it and, potentially, make sense of it. This is a very good use case for git log --follow -- filename which will output the history of a particular file.

For example:

> git log --follow -- file.mk 
commit a35da6d8414b199e5e0629237ba047edd07783d3
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date:   Wed Oct 3 03:27:14 2018 -0500

    Enables building with a right hook

commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date:   Thu Sep 27 04:48:34 2018 -0500

    Enables building with a left hook

git log [--since=date] [--until=date]

Perhaps you still need more research, a good idea might be to dig again in the repository history but this time making some filtering by date. By specifying the --since=date or/and --until=date you can filter the repository history by date, hence a more granular search is possible.

For example:

> git log --since="27/09/2018" --until="28/09/2018"
commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date:   Thu Sep 27 04:48:34 2018 -0500

    Enables building with a left hook

git bisect

You are starting to understand how the system works and voila, you realise there is a bug. You still don’t have enough information and want to understand what change introduced the bug. git bisect performs a binary search to find which commit in the system introduced the bug. You first select a "bad" commit where the system is broken and a "good" commit, where the system isn't broken. Then git bisect will select a commit between those two and ask you whether the selected commit is "good" or "bad" and so on until the responsible commit is identified.

Let's see a simple example, suppose you are trying to find a bug currently present but which is not present on the commit 71669696c5570f6e627f4ccc5722647e8ff14514:

> git bisect start
> git bisect bad
> git bisect good 71669696c5570f6e627f4ccc5722647e8ff14514

This will trigger a git bisection. Now git will drive you through the bisection checking out a proposed commit which you have to evaluate again as bad or good.

Bisecting: 2 revisions left to test after this (roughly 2 steps)
[2f9c695953cf9854c50108806c22a38ab9ae1d5a] Coding after party.

## Time to test and evaluate
> git bisect bad         ## Your test shown the system is broken

Reiterate the process...

Bisecting: 0 revisions left to test after this (roughly 1 step)								
[03bc5ebe773ce1ade24a60aecfdec704002b10e7] Introduce enhanced output to build system

## Time to test and evaluate
> git bisect good        ## Your test shown the system works

And there you go! Finally git bisect ends by showing you which was the first bad commit

2f9c695953cf9854c50108806c22a38ab9ae1d5a is the first bad commit
commit 2f9c695953cf9854c50108806c22a38ab9ae1d5a
Author: Ivan Drago <drago@russianexpress.com>
Date:   Wed Oct 5 22:22:14 2018 -0500

    Coding after party.

Another very interesting characteristic about git bisect is the possibility of automating the bisection by making use of an external tool that will be able to evaluate the correctness of the system given its current state. For example, you can write a script check.sh which is able to determine if the system is currently working as expected or not. The script, let’s name it check.sh, should return 0 when the current state is good and a value between 1 and 127 for when the system state is bad. Then you can run git bisect run check.sh to trigger an automatic bisection.

> git bisect start HEAD v1.2 --      # HEAD is bad, v1.2 is good
> git bisect run check.sh      # "check.sh" checks the system
> git bisect reset                   # quit the bisect session

Note: Actually the return code 125 for the script is reserved for when the system cannot be checked, on which case git bisect will skip the particular commit which made the script return that value.


There is no magic wand to get all the answers you need when facing a lack of documentation on a system, but that shouldn't prevent you from moving forward. As a Software Crafter, you should be able to look into your toolbox and make use of many tools in order to resolve your problem.

Sometimes the solution won't come directly, but with perseverance, effort, and using the right tools you can always move in the right direction. git is a very powerful tool and you shouldn't limit yourself to the straightforward commands you use daily for development; instead push yourself to dig deeper and embrace Git’s full potential.