In
"Migrating
Cafu to distributed version control – Part 1", I outlined the
fundamental considerations for the migration. This post continues the
subject with the more specific and technical details.
Goals and Requirements
Specifically for Cafu, what are the goals and requirements when converting to
Git?
As a first step, our
source code repository must technically
be converted from Subversion to Git. I wanted the conversion to be done in a
careful, accurate and complete manner: It should include all branches, vendor
branches, tags and authors, and the proper and complete merge history. In fact,
I wanted the resulting Git repository to look as if we had used Git right from
the start.
Conversions like this are generally well described in the documentation and
books about Git, but it turned out that our use of
Vendor
Branches, a normal and useful feature in Subversion, was very
difficult to migrate to Git, and that only little related documentation
can be found on the internet. Vendor branches are important to us because
we use them to manage our external libraries.
Besides my normal work, I spent a lot of time on this, on and off for several
months, slowly putting the related pieces together. I also described the
problem on the git-users mailing list in thread
"Importing
Subversion vendor-branches to Git". The thread provides both a good
technical summary as well as the remaining bits that I was still missing
at that time. This post is the synopsis of the gathered results.
Secondly, as the
issue tracker is closely related to the
repository, it needs to be updated accordingly. Options include:
- stay with Trac (but update it to work with
Git),
-
migrate it to the issue tracker provided with the Cafu repository at
BitBucket,
- migrate it to Atlassian
JIRA.
The first option would preserve the full existing flexibility of Trac, keeping
us a certain degree of independence, and not require getting used to something
else.
The second option would probably be the least complex and the most comfortable,
but it remains to be determined if the BitBucket issue tracker is powerful
enough for our needs. You can see a BitBucket issue tracker live at the
BitBucket site
itself.
The third option would be the most powerful, but it might as well overwhelm
us.
I've not yet formed an opinion about it though, much less a decision.
Fortunately, the migration of the issue tracker can be done largely independent
from the migration of the repository, so that its progress does not stall the
progress of the repository.
Migration Hotspots
While the bulk of the conversion is flawlessly and quickly done by the
git svn ...
commands, I found plenty of occasions where manual tweaking of the process, or
post-processing and clean-up work was necessary in order to achieve the desired
result. This is especially true whenever the Subversion source repository
deviates from the classic "trunk, branches, tags" layout, or subtleties of
Subversion merges prevent proper automatic conversion to Git.
In this section, I list the issues that I found the most prominent (from a Git
learners perspective), along with the solutions that I eventually applied.
Branches outside branches/
If some branches are according to Subversion repository standard layout in
branches/
, but more
branches are elsewhere, or if standard layout was never used and the branches
are arbitrarily scattered across the Subversion repository, it is
not
immediately clear if and how these extra branches can be accounted
for so that they are properly imported into Git. The solution is to split
the call to
git svn
clone
into this sequence:
> git svn init https://srv7.svn-repos.de/dev123/projects/cafu -s Cafu
> cd Cafu
> git config svn.authorsfile ../authors.txt
> git config --add svn-remote.svn.fetch "vendor:refs/remotes/vendor"
> git svn fetch
The next to last line causes the subsequent fetch to load the directory
vendor/
as a Git
branch, as if it was another branch in
branches/
.
Missing Merges
The converted commit history sometimes misses merges where merges were
performed in Subversion:
------B-----D---- master
/
----A-----C------ pristine
A was merged into master, yielding B, and the merge is properly
reproduced in Git.
C was merged into master as well, yielding D, but only in Subversion. In Git,
the merge is missing.
Among other reasons, this can happen if in Subversion the merge was performed
not at the top directory level, but as a "partial" merge from subdirectory to
subdirectory, e.g. from
forum/themes/firenzie
in
pristine directly to the same directory in master.
The solution for such cases is to use the
.git/info/grafts
file, and to
"fix" its results with
> git filter-branch --tag-name-filter cat -- --all
The
--tag-name-filter
cat
part makes sure that attached tags are rewritten as well.
If rewriting the commits succeeded and the result is as desired, the
grafts
file and the
references to the original commits should be deleted:
> rm .git/info/grafts
> rm -rf .git/refs/original/
The next call to
git
svn fetch
will automatically rebuild the
rev_map
that is needed for
continued bidirectional communication with the source Subversion
repository.
Fixing Tags
As Subversion treats tags exactly like branches, after the conversion to Git
the Git branches that should be tags must be fixed manually. A good solution is
described by Haenel and Plentz in
their book, but it unfortunately only works with lightweight tags and thus
cannot account for the tag message, which in our case is a longer text. The
best solution that I have found that works with annotated tags is from
this
Atlassian blog post, to which I however had to make small
modifications to work as desired:
> type convert_tags.sh
#!/bin/sh
# CF: from http://blogs.atlassian.com/2012/01/moving-confluence-from-subversion-to-git/ with small modifications.
# Based on https://github.com/haarg/convert-git-dbic
set -u
set -e
git for-each-ref --format='%(refname)' refs/remotes/tags/* | while read r; do
tag=${r#refs/remotes/tags/}
# CF: Note the ^ in the next line: We create the converted tag at the *parent* of the original tag.
sha1=$(git rev-parse "$r^")
commiterName="$(git show -s --pretty='format:%an' "$r")"
commiterEmail="$(git show -s --pretty='format:%ae' "$r")"
commitDate="$(git show -s --pretty='format:%ad' "$r")"
# Print the raw commit body (commit message).
git show -s --pretty='format:%B' "$r" | \
env GIT_COMMITTER_EMAIL="$commiterEmail" GIT_COMMITTER_DATE="$commitDate" GIT_COMMITTER_NAME="$commiterName" \
git tag -a -F - "$tag" "$sha1"
echo "Tag: ${tag} sha1: ${sha1} using '${commiterName}', '${commiterEmail}' on '${commitDate}'"
# Remove the svn/tags/* ref
git update-ref -d "$r"
done
Move to Subdirectory
Before I could fix missing merges in our (partially) converted Cafu repository,
I had to move the contents of all commits in the "vendor" branch into a
subdirectory. The documentation for
git
filter-branch has a related example, which I modified according to
this
discussion, yielding:
> git filter-branch --index-filter '
rm -f "$GIT_INDEX_FILE"
git read-tree --prefix=ExtLibs/ "$GIT_COMMIT"
' refs/heads/vendor
Vendor Branches
In our Subversion repositories, we make use of
Vendor
Branches, a normal and very useful feature in Subversion that is used
to manage "external" software. Vendor branches are however very difficult
to migrate to Git, and only very little related documentation can be
found on the internet.
Possible solutions are:
- Git submodules,
- Git subtrees,
- normal Git branches.
Git submodules are mentioned relatively frequently, but they really do not seem
to be a good fit for vendor branches. We don't consider them any further for
the reasons detailed in the
"Importing
Subversion vendor-branches to Git" thread.
Git subtrees are looking very interesting and well suited to the problem, and I
spent a lot of time digging into them. There is a
subtree extension that
is likely integrated into the Git core soon, and
Jakub Suder describes a solution using it that we might have adopted
(without the
--squash
).
Using normal Git branches as vendor branches is beautifully explained in
this
blog post by Dominic Mitchell. In fact, our website always was
structured in "live" and "pristine" branches right from the start, and
mapping these 1:1 to normal Git branches was straightforward and a clear
choice (but still required the "Branches outside branches/" and "Missing
Merges" facilities above).
For the vendor branches in Cafu, the matter was less clear: candidates were Git
subtrees or again normal Git branches. As mentioned before, I posted a detailed
and complete description of the problem in thread
"Importing
Subversion vendor-branches to Git".
Eventually, I opted for the "normal Git branches" approach (even though it
required the "Move to Subdirectory" step from above), because it is the most
simple, clearest approach that requires no "extras" at all, neither for the
DVCS nor for its users, and as a side effect we keep the door open to a future
migration to another VCS such as Mercurial.
Migration Details
I give the exact technical steps of converting the Cafu Subversion repository
to Git in
a comment
to this post, in order to keep this prose text readable and clear.
The next steps
Our official Git repository of the Cafu Engine is now available at:
Naturally, we will not immediately abandon the Subversion repository, but enter
a gradual transition period where everyone can make the switch at a comfortable
pace, and where we can deal with details like the issue tracker.
Personally, for a short while I expect to continue working mainly with
Subversion, updating the Git repository in a separate step.
Thereafter, I'll probably switch to work mainly with Git, but continue to
update the Subversion repository in a separate step.
Only when all users and all technical indications clearly suggest that we can
do entirely without Subversion, will the Subversion repository finally switch
off.