I love git. I’ll state that up front. I also love github, which I’ve expressed in the past. Both are making me look at software in a new way. I also think the pair of them are changing some of the rules we know for how open source projects emerge and move forward.
Recently I was working on building a Rails based Event Calendar for MHVLUG. This gave me a chance to dig in on ical, which has fascinated me since a set of talks at YAPC a decade ago. There were 2 ruby ical libraries out there (icalendar.rb and vpim.rb), neither did quite what I wanted, and both projects were more or less dormant (the mailing lists were lots of “is anyone alive?” posts). Ug, I was stuck, and if I had to start from scratch on ical, that was all I’d end up doing, never getting to my application.
I googled some more… and low and behold found a github.com fork of icalendar.rb, and forks of that. Those forks implemented about 50% of the fixes I needed to get ical generation with timezones to work. So I forked from one of those and 6 changesets later, had what I needed. I then built my application, and life was good.
A few days later I decided to collect up all the changes in all the github icalendar trees, and merge them into my tree. While git itself can be somewhat confusing, github adds this really slick web interface on top of git trees, that makes the merge process pretty painless. This is one of their key innovations, and it’s just incredible. I selected all the outstanding changes that would merge cleanly, pulled them in, and now had a tree which largely encompassed the 8 existing forks on github.com. I posted back to the dead mailing list and let people know there was this now living github tree where the project had seemed dead. I got a couple of new patches people wanted in, and 2 months later the maintainer actually showed up again and gave me admin access to the icalendar project so I could publish official versions.
This pattern repeated a few more times on the project. I found a piece of code on github that did 90% of what I needed, but I needed a change. I created my fork, added my feature, and pushed it back out (with a pull request). A few days later the maintainer pulled them back in, and now they are officially part of the project. I’m not vested in those projects, but I had relevant fixes, and because we were all using a tool that makes it easy to be a casual contributor, they are now part of the open source projects in the sky.
If you haven’t seen the paper on participation inequality, go and read it… now! Previously most of the studies on open source community participation focussed on big projects like the Linux Kernel, or Apache. That’s sort of like trying to understand patterns of home construction by looking at Frank Loyd Wright’s houses. Those projects are outliers in how communities work. This study did a much broader look at online communities and found the striking 1-9-90 pattern:
This is how communities work. 1% of the population does most of the work, 9% are casual contributors, and 90% are just consumers. Your user base is a silent majority. In an open source world the 1% are the core contributors, and possibly the heavy power users. 9% is the people that file a bug now and then, maybe a patch or two, everyone else is the people that just download your code and you never hear from them. This patern more or less holds true for all volunteer efforts.
In open source we’ve got an issue, which is that getting code from the 9% is hard. The 1% typically has access to a central source management repository, and can merge code fixes as soon as they see them. The 9% has to follow a completely different process, posting patches to trackers or mailing lists, many of which get lost because there are a bunch more manual steps to pull them into the main tree. If any process requires more effort by the 1%, it typically won’t happen, they are full up on time as it is.
And this is where git and github, start making things interesting. While I run a number of open source efforts, I end up in the 9% all the time. If you are now using git for your main tree the 9% and the 1% are now using the same tools, which allow seemless inclusion of code. The merge algorithm on git is really wonderous. I’ve had instances of massive renaming of files while trying to integrate external fixes in those files, and everything just worked. It actually surprised the hell out of me.
The 9% just want to casually contribute something they aren’t signing up for a lifestyle. Get my fix out there, if other people want it, great, if not, so be it. The fact that integration is 2 mouse clicks and 10 seconds of effort makes the chance of capturing those changes much more likely.
Recovering the Brown Field
Ever look at sourceforge.net? or any of the clones? 50% of those projects never got off the ground. Another 40% have died out for other reasons, the contributors: had a family, started working for a company that doesn’t let them work in OSS, got bored withthe project, died, or became inactive for any number of other reasons. When open source software exploded in 2000, there was a lot of greenfield. Everyone was out there building new stuff that no one had done before. But now we have a lot of brown field. A lot of 1/2 planned, 1/2 finished pieces of code that have useful bits in them, but have been abandonned by their original creators.
Tools like git and github help you recover that brown field. In the last couple of months I run into project after project that petered out in 2006, but has a bunch of good code. That means they are about 2 critical bug fixes away from being useful on modern systems. It’s really not much work, but in the old system , with the projects locked up in a forge with an SVN or CVS source management system, they were dead. You had to start over. With github you can import that tree and keep working.
It’s a new pattern on how the open source community is going to function, while it could be built on any distributed SCM, the fact that git has a really good svn 2 way bridge, and that github made itself “person oriented” vs. “project oriented” really make me believe that it’s creating a uniquely new pattern for both recovering the brown field of open source, and enabling the 9% to be much more effective with their output.
Software in the era of drive by contribution
Now that we’ve got a set of tools that really were designed for helping the 1% and the 9% work together, I think we’re going to see a whole new blosoming of open source software. The rules of what it means to be a project contributor are changing, in really exciting ways. Forking used to cheap, and merging expensive, which is why forking was considered an insult. But with tools like git merging is cheap, so the offensiveness of forking goes away. It opens up for more experimentation, and more complex contributions happening outside the 1% group. All this increases the velocity of contribution, and thus the volume of open source software out there.
I really think distributed source control is changing a lot of assumptions for how software gets developed. So if you haven’t yet dug into the space, do it.