One of the most critical features of modern-day version control relies on a poorly understood algorithm –
diff3. What is it, how does it work, and why it could be improved?
There are two critical operations that need to know the differences between files:
- Reviewing new changes
- Merging multiple changes
First, version control systems like
git don't actually store the patched differences between files. Instead,
git stores completely new files for every revision and computes the differences at runtime. That's why you can switch between really old revisions and new revisions quickly – otherwise, every patch would need to be sequentially applied.
So how does
git know the differences between files? Comparing two files
git uses a set of algorithms that mostly follow Myers' An O(ND) Difference Algorithm and Its Variations. It's closely related to a common interview question, the longest common subsequence problem.
But one of the most challenging aspects of version control is the 3-way merge. Let's say you, and a coworker independently make different revisions to a common document and then try to reconcile the differences. It's possible that some changes can be merged together automatically, but there might exist conflicts.
One merge tool that
git uses is called
diff3 or sometimes
kdiff3, which, as you can guess, produces a merged output of 3 different files that share a common ancestor.
diff3 was originally written in 1976 in Version 7 of Unix. The algorithm is a set of heuristics that tries to capture most edge cases of merging text together, but it's not straightforward or easy to understand. Doug McIlroy (or an engineer named Paul Jensen) wrote the first version of
diff3 for Unix. You can read McIlroy's paper, An Algorithm for Differential File Comparison.
You can enable printing merge conflicts in
diff3 format with the command. This has the benefit of showing you the diff output that includes the common ancestor, which usually helps when resolving conflicts.
$ git config --global merge.conflictstyle diff3
You can browse the source code of