Engineers often find themselves resolving difficult merge conflicts manually.

Merges with conflicts need diffs. Why isn't diff3 (overview) good enough to do conflict resolution automatically? Without going into the theory, I'll go over some of the issues with diff3. If you're interested in diving deeper, you can read A Formal Investigation of Diff3.

Problems with diff3

What's been tried

Semantic merge strategies for specific languages. Here's a tool called SemanticMerge that works for #C, Java, and C. There's also difftastic that supports structural diffs in over 20 different languages. However, difftastic does not generate patches or handle merges.

Patch-based algebras like darcs, which was an alternative version control system to git. You can read about the theory behind how darcs stored patches and resolved conflicts here.

Machine Learning for Merge Conflicts

What if we could train an algorithm to resolve common merge conflicts? We have millions of public merge conflict resolutions on GitHub as a data set. With a little magic, we could probably recreate the original diff'd conflict as well.

It seems like this is the best way to capture semantic differences across different languages – a resolution you would normally only get by parsing a language-specific AST or understanding syntax. Tricky patterns from dependency management conflicts that often live outside the AST in configuration files could be learned and fixed.