Have you ever had to understand a codebase where the lack of documentation reached all levels? From the absolute absence of a wiki to completely cryptic file names or meaningless code change commit messages, the only saving grace being Git version control. This happened to me recently and I had to really deep dive using Git in order to understand how the system works, let me walk you through that investigation, it’ll be fun, I promise.
In this blog post, we are not reviewing good practices for documenting projects up front, but instead, deep diving into the git tools we can use in situations where we need to deeply investigate into the history of a project, resolve the mysteries within it and ultimately regain control over it. None of the tools by themselves will solve an enigma, but hopefully, each tool will point you in the right direction. Using these tools with intelligence and perseverance will help you to resolve any puzzle. If you also take the appropriate actions upon new discoveries, especially documenting as you go, then you and everyone else on your team will have absolute control over the system.
Now, let's have a look at the set of technical tools I recommend with this post.
git shortlog
When fighting an unknown system, the first thing you want to do is look for the technical experts who are still around you, in the team or in the community. Therefore you should understand who are the top contributors to the system. Hopefully, they will have the answers you need.
git shortlog -e -n -s
will print the list of contributors in the whole repository ordered by number of contributions.
git shortlog -e -n -s [path]
will print the list of contributors in the specified [path]
ordered by number of contributions
For example:
> git shortlog -n -s -e
164 Willy Wonka <willy.wonka@fakedomain.com>
59 Indiana Jones <indy@fakedomain.com>
git effort
At this point, you still have no clue about what the system does, and everything you see is a bunch of seemingly random files. Perhaps they are not so random, so let's figure out which file is the one where the old team put the greatest effort in.
git effort
will print the number of commits that affected each file in the repository.
git effort --above number
will print the files affected for a number of commits greater than number
For example:
> git effort --above 50
path commits active days
path/to/file/verychangedfile.cfg 77 33
path/to/file/anotherverychangedfile.cfg 67 32
Note: git effort
is part of the git-extras
package, therefore you might not have it out of the box. Sure it is available for your OS.
git blame
git blame
is useful for those situations where you have already identified an interesting file, and you want to know who made changes on that file, and more importantly, what changes.
Using git blame
will output the file content alongside with author for a particular line, and commit hash for the same line.
For example:
> git blame interestingfile.cfg
225c29ba (CATS 2018-09-27 03:34:38 -0500 1) all your
225c29ba (CATS 2018-09-27 03:34:38 -0500 2) base
225c29ba (CATS 2018-09-27 03:34:38 -0500 3) are
225c29ba (CATS 2018-09-27 03:34:38 -0500 4) belong
225c29ba (CATS 2018-09-27 03:34:38 -0500 5) to us
git grep
You might not know yet how the system works, but you are following the trail of a concept or idea and want to search across multiple files that could potentially be related to that. git grep regexp
will print the lines in all the tracked files where the regexp
is present.
For example, you know that the word username
is or was written somewhere inside the system, but you don't know where:
> git grep username
/very/hidden/path/in/system/crypticfilename.txt: username=anonymous
git log
This is the most powerful git tool to understand the history of the system. git log
will output, ordered by date, the history of commits in the repository alongside with the author, and the commit message. All of this information is great but sometimes you need more, let's look at some options to make our search more granular.
git log -G
This command works on a very similar fashion to git grep
so you might want to use it on a similar situation. The difference is that git log -G regexp
will output the commits where regexp
is present on the diff associated to the commit.
For example, continuing with the research about where username
could be present in the system, let's see the commits are related to it:
git log -G username
commit 9082a4abd6497e4dee348d5ccd74c472167b4955
Author: Agent -007 <agent@noob.com>
Date: Tue Jul 18 12:36:02 2017 -0700
Add secure details to file
commit 71669696c5570f6e627f4ccc5722647e8ff14514
Author: Agent 007 <agent@not-that-noob.com>
Date: Mon Jul 10 12:35:27 2017 -0700
Remove username and password from file
Note: Sorry for the offtopic, but this is important.
Obviously Mr. Agent -007 did a huge mistake by committing sensitive information into the repository, but his successor Mr. Agent 007 didn't make it better. What the last one should have done, is to remove the particular change completely from the repository history.
git log --follow -- filename
Aha! you did eventually find an apparently relevant file in your system and you want to understand the changes applied to it and, potentially, make sense of it. This is a very good use case for git log --follow -- filename
which will output the history of a particular file.
For example:
> git log --follow -- file.mk
commit a35da6d8414b199e5e0629237ba047edd07783d3
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date: Wed Oct 3 03:27:14 2018 -0500
Enables building with a right hook
commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date: Thu Sep 27 04:48:34 2018 -0500
Enables building with a left hook
git log [--since=date] [--until=date]
Perhaps you still need more research, a good idea might be to dig again in the repository history but this time making some filtering by date. By specifying the --since=date
or/and --until=date
you can filter the repository history by date, hence a more granular search is possible.
For example:
> git log --since="27/09/2018" --until="28/09/2018"
commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f
Author: Rocky Balboa <rocky.balboa@stallion.com>
Date: Thu Sep 27 04:48:34 2018 -0500
Enables building with a left hook
git bisect
You are starting to understand how the system works and voila, you realise there is a bug. You still don’t have enough information and want to understand what change introduced the bug. git bisect
performs a binary search to find which commit in the system introduced the bug. You first select a "bad" commit where the system is broken and a "good" commit, where the system isn't broken. Then git bisect
will select a commit between those two and ask you whether the selected commit is "good" or "bad" and so on until the responsible commit is identified.
Let's see a simple example, suppose you are trying to find a bug currently present but which is not present on the commit 71669696c5570f6e627f4ccc5722647e8ff14514
:
> git bisect start
> git bisect bad
> git bisect good 71669696c5570f6e627f4ccc5722647e8ff14514
This will trigger a git bisection. Now git
will drive you through the bisection checking out a proposed commit which you have to evaluate again as bad
or good
.
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[2f9c695953cf9854c50108806c22a38ab9ae1d5a] Coding after party.
## Time to test and evaluate
> git bisect bad ## Your test shown the system is broken
Reiterate the process...
Bisecting: 0 revisions left to test after this (roughly 1 step)
[03bc5ebe773ce1ade24a60aecfdec704002b10e7] Introduce enhanced output to build system
## Time to test and evaluate
> git bisect good ## Your test shown the system works
And there you go! Finally git bisect
ends by showing you which was the first bad commit
2f9c695953cf9854c50108806c22a38ab9ae1d5a is the first bad commit
commit 2f9c695953cf9854c50108806c22a38ab9ae1d5a
Author: Ivan Drago <drago@russianexpress.com>
Date: Wed Oct 5 22:22:14 2018 -0500
Coding after party.
Another very interesting characteristic about git bisect
is the possibility of automating the bisection by making use of an external tool that will be able to evaluate the correctness of the system given its current state. For example, you can write a script check.sh
which is able to determine if the system is currently working as expected or not. The script, let’s name it check.sh
, should return 0
when the current state is good
and a value between 1
and 127
for when the system state is bad
. Then you can run git bisect run check.sh
to trigger an automatic bisection.
> git bisect start HEAD v1.2 -- # HEAD is bad, v1.2 is good
> git bisect run check.sh # "check.sh" checks the system
> git bisect reset # quit the bisect session
Note: Actually the return code 125
for the script is reserved for when the system cannot be checked, on which case git bisect
will skip the particular commit which made the script return that value.
Conclusions
There is no magic wand to get all the answers you need when facing a lack of documentation on a system, but that shouldn't prevent you from moving forward. As a Software Crafter, you should be able to look into your toolbox and make use of many tools in order to resolve your problem.
Sometimes the solution won't come directly, but with perseverance, effort, and using the right tools you can always move in the right direction. git
is a very powerful tool and you shouldn't limit yourself to the straightforward commands you use daily for development; instead push yourself to dig deeper and embrace Git’s full potential.