Ollie Armstrong

Linux // AWS // Go



Why I host my own git server

27 Nov 2016

I'm part of a small group of people who instead of using GitHub (or GitLab, BitBucket et al.) decided to host my own git repositories on my server.

I don't support GitHub becoming a monopoly for open source projects, I don't like their Terms of Service, I dislike relying on a central service in general and I want my data to be portable. I choose not to use GitHub.

Monopoly

A monopoly on open source hosting is dangerous. GitHub is very close to being the sole host for open source projects. I'd say at least 90% of projects I browse to will end up being hosted on GitHub, often including their website too through GitHub Pages.

There's also the myriad of build tools and dependency managers that clone from a git URI. Given that the vast majority of libraries are on GitHub this leads to a big dependency on GitHub. GitHub down? Builds broken (maybe). Every GitHub outage makes its way to /r/webdev within seconds, developers screaming that they are unable work. In some cases this may be because their company relies on GitHub, which is another issue all together.

The biggest worry is if GitHub shuts down. Maybe they run out of money. Maybe some governmental involvement requires a cessation of service. Maybe they get hit by Mr Robot (spoiler alert for link). There are countless scenarios that can lead to GitHub being unavailable for a very long period of time, or permanently. The outcome of this? 90% (my guesstimate figure) of open source projects are no longer accessible. Links are broken throughout the internet. And we all struggle to find our favourite project's new homepage.

I don't support a single entity becoming the open source project host. Remember SourceForge?

Terms of Service

GitHub has Terms of Service in place just like every other SaaS offering on the web. I admit the majority is fairly standard stuff but it's also quite inconvenient.

GitHub, in its sole discretion, has the right to suspend or terminate your account and refuse any and all current or future use of the Service, or any other GitHub service, for any reason at any time.

GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.

You shall defend GitHub against any claim, demand, suit or proceeding made or brought against GitHub by a third-party alleging that Your Content, or Your use of the Service in violation of this Agreement, infringes or misappropriates the intellectual property rights of a third-party or violates applicable law, and shall indemnify GitHub for any damages finally awarded against, and for reasonable attorney’s fees incurred by, GitHub in connection with any such claim, demand, suit or proceeding; [...]

We may, but have no obligation to, remove Content and Accounts containing Content that we determine in our sole discretion are unlawful, offensive, threatening, libelous, defamatory, pornographic, obscene or otherwise objectionable or violates any party's intellectual property or these Terms of Service.

All quotes from the GitHub Terms of Service, 27 November 2016. All emphasis mine.

Now I am not a lawyer, but that third quote (Terms F.3) is particularly scary to me. GitHub also goes ahead and removes repositories that it deems as offensive. They have made it clear they will follow through on this by nuking a repository that contained the word "retard". There's no doubt that the author should have chosen better wording, but censorship on this level is unacceptable to me. Especially with a word that we can often hear online - despite the disgusting use of it.

I haven't checked all the alternative public git hosts but I would bet they have very similar clauses. It is mostly standard for SaaS products.

Git is distributed, don't forget it

Git was designed as a distributed system. We've then adapted it back into a centralised model. Even though we don't loose all the benefits of a distributed SCM, we've got to be careful not to rely upon a single entity for our code. This isn't Subversion, we don't require a single remote anymore.

Were you aware that you could pull directly from a coworker's PC? I know people who didn't even consider that may be possible. Let's not forget that git is a distributed system, we can have many remotes and our code can live in many places. Git has great support for mirroring provided by git clone --mirror - let's use it!

This particular argument isn't to not use GitHub, but don't rely on a single host. Use many hosts. And don't forget that a host isn't even needed!

Pull requests aren't the only way to do merges either. Git has fantastic support for managing patches and sending patches via email. Seriously, it's not a difficult thing to do any more. This is literally how the largest open source project on the planet (the Linux kernel) works. They send each other emails with patches. No GitHub needed, no pull requests, no central server at all. When a particular merge becomes unwieldy to send via email, they can pull directly from each other or each other's servers.

Don't get locked in

Using all the other features that usually come with a git host is an easy way to get yourself locked in. I'm talking about "Issues" or bug tracking in general. With GitHub at least, the only way to get that data out is via the API. You'll have to grab it and make it portable yourself.

Just imagine the case where you want to switch hosts. Suppose GitHub decide your repository is offensive and you disagree, you'll want a new host. But then you could well be stuck when your new service cannot automatically import GitHub Issues. Or has an issue (pun intended) pulling all the data.

I hope you back your data up, but how would you do this for your Issues? Maybe there are scripts out there already to pull from the API, but you aren't going to be able to restore that to GitHub and I haven't come across a git host that'll let you upload a big ass file to import Issues.

So what's the solution? I don't know. I tend to manage my bugs in a plaintext file in my repository. I get that this won't scale beyond a very small project but there's still got to be a better way than using Issues. Maybe a SaaS bug tracker that is more open to data portability?


So don't think you need GitHub. Don't think you need a central server at all. Make sure your code exists on more than one host so we aren't relying on you to get it pushed when your only host gets shut down. Keep your data portable.

My solution is a server running gitolite and cgit for a web interface. In the near future I will also mirror all of my projects on a public host. Probably GitLab (they are awesome).

Finally, if your company is relying on GitHub to do business, I think you're making a terrible mistake. I'll have a follow up post on running an internal git service for your company soon.


I welcome all comments and questions by email. My address is on the homepage of this site.