Migrating Cafu to distributed version control – Part 2

In "Migrating Cafu to distributed version control – Part 1", I outlined the fundamental considerations for the migration. This post continues the subject with the more specific and technical details.


Goals and Requirements

Specifically for Cafu, what are the goals and requirements when converting to Git?

As a first step, our source code repository must technically be converted from Subversion to Git. I wanted the conversion to be done in a careful, accurate and complete manner: It should include all branches, vendor branches, tags and authors, and the proper and complete merge history. In fact, I wanted the resulting Git repository to look as if we had used Git right from the start.

Conversions like this are generally well described in the documentation and books about Git, but it turned out that our use of Vendor Branches, a normal and useful feature in Subversion, was very difficult to migrate to Git, and that only little related documentation can be found on the internet. Vendor branches are important to us because we use them to manage our external libraries.

Besides my normal work, I spent a lot of time on this, on and off for several months, slowly putting the related pieces together. I also described the problem on the git-users mailing list in thread "Importing Subversion vendor-branches to Git". The thread provides both a good technical summary as well as the remaining bits that I was still missing at that time. This post is the synopsis of the gathered results.

Secondly, as the issue tracker is closely related to the repository, it needs to be updated accordingly. Options include:

  1. stay with Trac (but update it to work with Git),
  2. migrate it to the issue tracker provided with the Cafu repository at BitBucket,
  3. migrate it to Atlassian JIRA.

The first option would preserve the full existing flexibility of Trac, keeping us a certain degree of independence, and not require getting used to something else.
The second option would probably be the least complex and the most comfortable, but it remains to be determined if the BitBucket issue tracker is powerful enough for our needs. You can see a BitBucket issue tracker live at the BitBucket site itself.
The third option would be the most powerful, but it might as well overwhelm us.

I've not yet formed an opinion about it though, much less a decision. Fortunately, the migration of the issue tracker can be done largely independent from the migration of the repository, so that its progress does not stall the progress of the repository.


Migration Hotspots

While the bulk of the conversion is flawlessly and quickly done by the git svn ... commands, I found plenty of occasions where manual tweaking of the process, or post-processing and clean-up work was necessary in order to achieve the desired result. This is especially true whenever the Subversion source repository deviates from the classic "trunk, branches, tags" layout, or subtleties of Subversion merges prevent proper automatic conversion to Git.

In this section, I list the issues that I found the most prominent (from a Git learners perspective), along with the solutions that I eventually applied.

Branches outside branches/

If some branches are according to Subversion repository standard layout in branches/, but more branches are elsewhere, or if standard layout was never used and the branches are arbitrarily scattered across the Subversion repository, it is not immediately clear if and how these extra branches can be accounted for so that they are properly imported into Git. The solution is to split the call to git svn clone into this sequence:
> git svn init https://srv7.svn-repos.de/dev123/projects/cafu -s Cafu
> cd Cafu
> git config svn.authorsfile ../authors.txt
> git config --add svn-remote.svn.fetch "vendor:refs/remotes/vendor"
> git svn fetch
The next to last line causes the subsequent fetch to load the directory vendor/ as a Git branch, as if it was another branch in branches/.

Missing Merges

The converted commit history sometimes misses merges where merges were performed in Subversion:
    ------B-----D---- master
         /
    ----A-----C------ pristine
A was merged into master, yielding B, and the merge is properly reproduced in Git.
C was merged into master as well, yielding D, but only in Subversion. In Git, the merge is missing.

Among other reasons, this can happen if in Subversion the merge was performed not at the top directory level, but as a "partial" merge from subdirectory to subdirectory, e.g. from forum/themes/firenzie in pristine directly to the same directory in master.

The solution for such cases is to use the .git/info/grafts file, and to "fix" its results with
> git filter-branch --tag-name-filter cat -- --all
The --tag-name-filter cat part makes sure that attached tags are rewritten as well.
If rewriting the commits succeeded and the result is as desired, the grafts file and the references to the original commits should be deleted:
> rm .git/info/grafts
> rm -rf .git/refs/original/
The next call to git svn fetch will automatically rebuild the rev_map that is needed for continued bidirectional communication with the source Subversion repository.

Fixing Tags

As Subversion treats tags exactly like branches, after the conversion to Git the Git branches that should be tags must be fixed manually. A good solution is described by Haenel and Plentz in their book, but it unfortunately only works with lightweight tags and thus cannot account for the tag message, which in our case is a longer text. The best solution that I have found that works with annotated tags is from this Atlassian blog post, to which I however had to make small modifications to work as desired:
> type convert_tags.sh
#!/bin/sh
# CF: from http://blogs.atlassian.com/2012/01/moving-confluence-from-subversion-to-git/ with small modifications.
# Based on https://github.com/haarg/convert-git-dbic
set -u
set -e

git for-each-ref --format='%(refname)' refs/remotes/tags/* | while read r; do
tag=${r#refs/remotes/tags/}
# CF: Note the ^ in the next line: We create the converted tag at the *parent* of the original tag.
sha1=$(git rev-parse "$r^")

commiterName="$(git show -s --pretty='format:%an' "$r")"
commiterEmail="$(git show -s --pretty='format:%ae' "$r")"
commitDate="$(git show -s --pretty='format:%ad' "$r")"
# Print the raw commit body (commit message).
git show -s --pretty='format:%B' "$r" | \
env GIT_COMMITTER_EMAIL="$commiterEmail" GIT_COMMITTER_DATE="$commitDate" GIT_COMMITTER_NAME="$commiterName" \
git tag -a -F - "$tag" "$sha1"
echo "Tag: ${tag} sha1: ${sha1} using '${commiterName}', '${commiterEmail}' on '${commitDate}'"

# Remove the svn/tags/* ref
git update-ref -d "$r"
done

Move to Subdirectory

Before I could fix missing merges in our (partially) converted Cafu repository, I had to move the contents of all commits in the "vendor" branch into a subdirectory. The documentation for git filter-branch has a related example, which I modified according to this discussion, yielding:
> git filter-branch --index-filter '
      rm -f "$GIT_INDEX_FILE"
      git read-tree --prefix=ExtLibs/ "$GIT_COMMIT"
  ' refs/heads/vendor

Vendor Branches

In our Subversion repositories, we make use of Vendor Branches, a normal and very useful feature in Subversion that is used to manage "external" software. Vendor branches are however very difficult to migrate to Git, and only very little related documentation can be found on the internet.
Possible solutions are:

  1. Git submodules,
  2. Git subtrees,
  3. normal Git branches.

Git submodules are mentioned relatively frequently, but they really do not seem to be a good fit for vendor branches. We don't consider them any further for the reasons detailed in the "Importing Subversion vendor-branches to Git" thread.

Git subtrees are looking very interesting and well suited to the problem, and I spent a lot of time digging into them. There is a subtree extension that is likely integrated into the Git core soon, and Jakub Suder describes a solution using it that we might have adopted (without the --squash).

Using normal Git branches as vendor branches is beautifully explained in this blog post by Dominic Mitchell. In fact, our website always was structured in "live" and "pristine" branches right from the start, and mapping these 1:1 to normal Git branches was straightforward and a clear choice (but still required the "Branches outside branches/" and "Missing Merges" facilities above).

For the vendor branches in Cafu, the matter was less clear: candidates were Git subtrees or again normal Git branches. As mentioned before, I posted a detailed and complete description of the problem in thread "Importing Subversion vendor-branches to Git".

Eventually, I opted for the "normal Git branches" approach (even though it required the "Move to Subdirectory" step from above), because it is the most simple, clearest approach that requires no "extras" at all, neither for the DVCS nor for its users, and as a side effect we keep the door open to a future migration to another VCS such as Mercurial.


Migration Details

I give the exact technical steps of converting the Cafu Subversion repository to Git in a comment to this post, in order to keep this prose text readable and clear.


The next steps

Our official Git repository of the Cafu Engine is now available at:


Naturally, we will not immediately abandon the Subversion repository, but enter a gradual transition period where everyone can make the switch at a comfortable pace, and where we can deal with details like the issue tracker.

Personally, for a short while I expect to continue working mainly with Subversion, updating the Git repository in a separate step.
Thereafter, I'll probably switch to work mainly with Git, but continue to update the Subversion repository in a separate step.
Only when all users and all technical indications clearly suggest that we can do entirely without Subversion, will the Subversion repository finally switch off.

:up: