Have you ever had to understand a codebase where the lack of documentation reached all levels? From the absolute absence of a wiki to completely cryptic file names or meaningless code change commit messages, the only saving grace being Git version control. This happened to me recently and I had to really deep dive using Git in order to understand how the system works, let me walk you through that investigation, it’ll be fun, I promise.
In this blog post, we are not reviewing good practices for documenting projects up front, but instead, deep diving into the git tools we can use in situations where we need to deeply investigate into the history of a project, resolve the mysteries within it and ultimately regain control over it. None of the tools by themselves will solve an enigma, but hopefully, each tool will point you in the right direction. Using these tools with intelligence and perseverance will help you to resolve any puzzle. If you also take the appropriate actions upon new discoveries, especially documenting as you go, then you and everyone else on your team will have absolute control over the system.
Now, let's have a look at the set of technical tools I recommend with this post.
When fighting an unknown system, the first thing you want to do is look for the technical experts who are still around you, in the team or in the community. Therefore you should understand who are the top contributors to the system. Hopefully, they will have the answers you need.
git shortlog -e -n -s will print the list of contributors in the whole repository ordered by number of contributions.
git shortlog -e -n -s [path] will print the list of contributors in the specified
[path] ordered by number of contributions
> git shortlog -n -s -e 164 Willy Wonka <email@example.com> 59 Indiana Jones <firstname.lastname@example.org>
At this point, you still have no clue about what the system does, and everything you see is a bunch of seemingly random files. Perhaps they are not so random, so let's figure out which file is the one where the old team put the greatest effort in.
git effort will print the number of commits that affected each file in the repository.
git effort --above number will print the files affected for a number of commits greater than
> git effort --above 50 path commits active days path/to/file/verychangedfile.cfg 77 33 path/to/file/anotherverychangedfile.cfg 67 32
git effort is part of the
git-extras package, therefore you might not have it out of the box. Sure it is available for your OS.
git blame is useful for those situations where you have already identified an interesting file, and you want to know who made changes on that file, and more importantly, what changes.
git blame will output the file content alongside with author for a particular line, and commit hash for the same line.
> git blame interestingfile.cfg 225c29ba (CATS 2018-09-27 03:34:38 -0500 1) all your 225c29ba (CATS 2018-09-27 03:34:38 -0500 2) base 225c29ba (CATS 2018-09-27 03:34:38 -0500 3) are 225c29ba (CATS 2018-09-27 03:34:38 -0500 4) belong 225c29ba (CATS 2018-09-27 03:34:38 -0500 5) to us
You might not know yet how the system works, but you are following the trail of a concept or idea and want to search across multiple files that could potentially be related to that.
git grep regexp will print the lines in all the tracked files where the
regexp is present.
For example, you know that the word
username is or was written somewhere inside the system, but you don't know where:
> git grep username /very/hidden/path/in/system/crypticfilename.txt: username=anonymous
This is the most powerful git tool to understand the history of the system.
git log will output, ordered by date, the history of commits in the repository alongside with the author, and the commit message. All of this information is great but sometimes you need more, let's look at some options to make our search more granular.
This command works on a very similar fashion to
git grep so you might want to use it on a similar situation. The difference is that
git log -G regexp will output the commits where
regexp is present on the diff associated to the commit.
For example, continuing with the research about where
username could be present in the system, let's see the commits are related to it:
git log -G username commit 9082a4abd6497e4dee348d5ccd74c472167b4955 Author: Agent -007 <email@example.com> Date: Tue Jul 18 12:36:02 2017 -0700 Add secure details to file commit 71669696c5570f6e627f4ccc5722647e8ff14514 Author: Agent 007 <firstname.lastname@example.org> Date: Mon Jul 10 12:35:27 2017 -0700 Remove username and password from file
Note: Sorry for the offtopic, but this is important.
Obviously Mr. Agent -007 did a huge mistake by committing sensitive information into the repository, but his successor Mr. Agent 007 didn't make it better. What the last one should have done, is to remove the particular change completely from the repository history.
Aha! you did eventually find an apparently relevant file in your system and you want to understand the changes applied to it and, potentially, make sense of it. This is a very good use case for
git log --follow -- filename which will output the history of a particular file.
> git log --follow -- file.mk commit a35da6d8414b199e5e0629237ba047edd07783d3 Author: Rocky Balboa <email@example.com> Date: Wed Oct 3 03:27:14 2018 -0500 Enables building with a right hook commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f Author: Rocky Balboa <firstname.lastname@example.org> Date: Thu Sep 27 04:48:34 2018 -0500 Enables building with a left hook
Perhaps you still need more research, a good idea might be to dig again in the repository history but this time making some filtering by date. By specifying the
--until=date you can filter the repository history by date, hence a more granular search is possible.
> git log --since="27/09/2018" --until="28/09/2018" commit f07b8fdc3ea77f161e2bd1d4153a6478f19d426f Author: Rocky Balboa <email@example.com> Date: Thu Sep 27 04:48:34 2018 -0500 Enables building with a left hook
You are starting to understand how the system works and voila, you realise there is a bug. You still don’t have enough information and want to understand what change introduced the bug.
git bisect performs a binary search to find which commit in the system introduced the bug. You first select a "bad" commit where the system is broken and a "good" commit, where the system isn't broken. Then
git bisect will select a commit between those two and ask you whether the selected commit is "good" or "bad" and so on until the responsible commit is identified.
Let's see a simple example, suppose you are trying to find a bug currently present but which is not present on the commit
> git bisect start > git bisect bad > git bisect good 71669696c5570f6e627f4ccc5722647e8ff14514
This will trigger a git bisection. Now
git will drive you through the bisection checking out a proposed commit which you have to evaluate again as
Bisecting: 2 revisions left to test after this (roughly 2 steps) [2f9c695953cf9854c50108806c22a38ab9ae1d5a] Coding after party. ## Time to test and evaluate > git bisect bad ## Your test shown the system is broken
Reiterate the process...
Bisecting: 0 revisions left to test after this (roughly 1 step) [03bc5ebe773ce1ade24a60aecfdec704002b10e7] Introduce enhanced output to build system ## Time to test and evaluate > git bisect good ## Your test shown the system works
And there you go! Finally
git bisect ends by showing you which was the first bad commit
2f9c695953cf9854c50108806c22a38ab9ae1d5a is the first bad commit commit 2f9c695953cf9854c50108806c22a38ab9ae1d5a Author: Ivan Drago <firstname.lastname@example.org> Date: Wed Oct 5 22:22:14 2018 -0500 Coding after party.
Another very interesting characteristic about
git bisect is the possibility of automating the bisection by making use of an external tool that will be able to evaluate the correctness of the system given its current state. For example, you can write a script
check.sh which is able to determine if the system is currently working as expected or not. The script, let’s name it
check.sh, should return
0 when the current state is
good and a value between
127 for when the system state is
bad. Then you can run
git bisect run check.sh to trigger an automatic bisection.
> git bisect start HEAD v1.2 -- # HEAD is bad, v1.2 is good > git bisect run check.sh # "check.sh" checks the system > git bisect reset # quit the bisect session
Note: Actually the return code
125 for the script is reserved for when the system cannot be checked, on which case
git bisect will skip the particular commit which made the script return that value.
There is no magic wand to get all the answers you need when facing a lack of documentation on a system, but that shouldn't prevent you from moving forward. As a Software Crafter, you should be able to look into your toolbox and make use of many tools in order to resolve your problem.
Sometimes the solution won't come directly, but with perseverance, effort, and using the right tools you can always move in the right direction.
git is a very powerful tool and you shouldn't limit yourself to the straightforward commands you use daily for development; instead push yourself to dig deeper and embrace Git’s full potential.
We plan, design, and develop the world’s most desirable software products. Our team’s expertise helps
brands like Sony, Motorola, Tesco, Channel4, BBC, and News Corp build fully customized Android devices
or simply make their mobile experiences the best on the market. Since 2008, our full in-house teams work
from London, Liverpool, Berlin, Barcelona, and NYC.
Let’s get in contact