Combining Unrelated Git Repositories: When Projects Collide! | by Sebastian Royer | May, 2022

How do you merge two Git repositories?

Recently started a project to build an AI-powered bookmarking extension for Chrome using separate git repositories to get the guts created: one for the chrome extension and one for a React/TypeScript front-end.

I figured it would be simpler to start with isolated repositories and combine them once I had basic functionality working for each part. Eventually I needed to combine these repositories in order to use the front-end for the popup and Bookmark page override in the chrome extension.

I went google hunting and found this concise and well voted answer on how to merge one repository into another:

It is definitely worth reading and is by definition the TLDR for this article.

After digging through the comments there I decided to go line by line through the recommended approach and check it against the git documentation in order to better understand what was happening under “the porcelain” as they say. Lucky for you, I took notes.

For reference here is the top-voted and accepted solution to merge project-a into project-b, with line numbers that link to the explanations below:

1 cd path/to/project-b
2 git remote add project-a /path/to/project-a
3 git fetch project-a --tags
4 git merge --allow-unrelated-histories project-a/master # or whichever branch you want to merge
5 git remote remove project-a

We start easily enough:

1 cd path/to/project-b

The cd command changes your working directory to project-b so that we can merge project-a into it. Simple 🙂

2 git remote add project-a /path/to/project-a

The git remote command ‘manages the set of repositories (“remotes”) whose branches are tracked’ and we’ll be using it again later as well.
git remote add adds the named remote repository from the specified path or url. So here we are adding project-a as a remote repository to the project-b repository from the location /path/to/project-a.

3 git fetch project-a --tags 

git fetch ‘downloads objects and refs from another repository’.
fetch downloads branches and tags (collectively, ‘refs’) from the repositories named along with the objects necessary to complete their histories. Any tag that points into the referenced histories is also fetched.
Using the--tags option fetches all tags from the remote ‘refs/tags/’ directory into the local repository’s tags with the same name.
As these projects are both quite young and I have not begun to version them I do not have any tags of value to combine, so I am not going to include the — tags flag in my command.

For a bit more depth on tags refer to my diversion below.

4 git merge --allow-unrelated-histories project-a/master # or whichever branch you want to merge 

Here we get to the meat of the process with git merge which ‘joins two or more development histories together.

The merge command incorporates changes from the named commit(s) into the current branch. It will ‘replay’ the commit changes made on the named commit’s branch since it diverged from the current branch and record the result in a new commit along with the names of the two-parent commits and a log message from the user. As you might already know, this is a common scenario with multiple branches in a single repository and that is the way the merge command is generally designed and used. It does this by comparing the state of the files in the named commit’s branch to the same files’ state in the current branch and using a diffing algorithm to find where they diverge. Non-overlapping changes are made automatically while overlapping changes or ‘conflicts’ are presented to the user to select which version of the line to use.

Rather than dig too deep on the underlying mechanics I’ll just point to this simple answer as a good introduction to the three-way-merge algorithm that git defaults to, a good overview of the different merge strategies, and Atlassian’s worthy tutorial entry.

In our case we are not merging branches with a shared history, so the --allow-unrelated-histories flag explicitly lets us merge two histories that do not have a common ancestor. If these were different versions of the same project this would be a potentially catastrophic operation that could leave our working tree missing commits with no way to revert, which is why the flag was introduced. With two separate projects combined, however, those issues are not possible.

For the truly adventurous, here is the actual implementation of git merge where a quick Find in Page with the string ‘allow_unrelated’ will traverse in the code to where the--allow-unrelated-histories flag is handled.

An important note picked up from the docs is to make sure to have your working tree up to date with all changes on both the named and current branches (or in our case repositories,) with no outstanding unstaged changes that you might care about. If this is not the case those outstanding changes have a chance of being lost.

5 git remote remove project-a

Finally, we return to git remote and use git remote remove to remove the named project-a from project-b‘s remote tracking. All remote-tracking branches and configuration settings for project-a are removed from project-bso that our now merged project no longer refers to the still existing project-a. This completes our process by uncoupling the two repositories after we have incorporated project-a‘s files and history into project-b successfully.

Not included in the original stackoverflow.com answer we could include a line like rm -rf path/to/project-a if we no longer wanted to keep the copy of the original project-a.

I wound up cleaning up both projects’ working trees and then ran these exact commands to merge my two projects. It went as smoothly as advertised in the answer comments, with only one line of conflict in my .gitignore file which was easily fixed. If you’ve read this far, I hope you’ve got a couple of repos to smash together! Good luck!

Here I felt the need to update myself on the tag functionality in git. Once again, the excellent git docs lay it out.

Essentially tags in git can be used to create a label for a particular commit, allowing you to reference it later. They allow a canonical reference point to a specific commit as opposed to a branch that starts at a specific commit and then tracks with changes. This allows for, in particular, versioning notation, so that a specific commit can be labeled as eg. v1.4 and later referenced via git commands.

There are two types of tags: ‘lightweight’ and ‘annotated’.

Lightweight stores only the tag name and reference to the commit checksum. It can be used to store temporary tags as a reference to transitional states or generally tags that are not expected to be maintained/shared.

Annotated tags include a message as well as a tag name, and store a full checksummed object in the git database including the tagger’s information and tagging date. This is the recommended type of tag for most tagging as it provides full information and can be signed and verified with GPG if needed.

A nice feature is the ability to tag commits after the fact so that the labeling process in the case of versioning does not have to take place in real-time but can be managed separately.

Tags are not shared by default when using ie git push to a remote, so just like our git fetch --tags call a git push --tags option exists to send tags to a remote repository if desired.

Leave a Comment