Not Safe From Wolves

Hating on git-submodule

I feel that I should explain and defend my stance on git submodules. There are no doubt times when splitting a module into parts, or using submodules to import vendored dependencies is convenient. If this all works for you, that’s cool too.

For only referencing publicly available repositories that don’t change often and that you don’t actively work on, I think it’s possibly even a nice work flow.

Some of the drawbacks that I’ve experienced with submodules:

Modifying files within a submodule

It’s easy to make changes in a submodule, commit and push and think that everyone else has your change. Depending on where you have pushed they may either have a new committish recorded in .gitmodules that they can’t reach or you’ve pushed to a library’s repository and not updated the committish, so you’re puzzled why they don’t see your changes.

Updating the code in a submodule and making that visible to everyone requires two commits and two pushes. Depending on your continuous integration flow, that might also trigger two sets of tests.

Dangling references in .gitmodule

I’ve experienced a developer pushing the top-level repository and then going on holiday, with the updated code for the submodule only existing on their laptop and them out of communication.

This leaves all of the downstream users unable to do a git submodule update until the .gitmodules file is manually updated to a previous version (and the work that relies on the module reverted or moved to a branch).

Yes, this is human error, but it’s both easier to do and more painful to fix than the alternatives.

Merge ordering matters

If you push your outer project before any changed submodules, you may find that when you go to push the submodules you’ll need to merge or rebase because someone else has been working on it. Until you merge/rebase and push the submodule and update the reference in your outer project and commit and push, your project is in an unbuildable state.

Relying on a single git URL

A git submodule encodes the git URL into the repository and repository history. Working at an organisation with over 1,000 git repositories, most of them private, we prefer to keep our git server private.

For situations where customers are working on code too, we mirror the source code or move it to a third-party, like a GitHub private repository.

Using submodules means either giving access to a third party to our git server (unacceptable) or duplicating the code in a place that they can access (expensive, with private GitHub repositories).

To make things worse, we foolishly renamed our git server a few years ago and still find references to the old name, which require a git commit to change. Checking out old versions of our source which reference the old names always requires a manual step to get them working.

Solving these problems

git subtree is a fix for basically all of these problems. It does feel like you’re using some dangerous git magic, and it’s definitely possible to get things wrong, but by and large you can carrying on working on a repository containing a subtree as if it was just one big repository.

Sharing with a customer means only exporting one git URL for them, and not hard-coding any URLs which may change into the history of the project.

Another approach that we’ve used, which does have some of the downsides of git submodules, is using the language vendor’s own packaging to ship our libraries, such as wrapping them in pip installable Python Eggs or Composer repositories. That’s bringing in more tools to learn, though, so you YMMV if you’re not already using those.

Thank you for reading. I feel much better.

This is archived content. New updates will appear on insom.github.io.