Software in the era of drive by contribution

Tuesday, April 28th, 2009

I love git.  I’ll state that up front.  I also love github, which I’ve expressed in the past.  Both are making me look at software in a new way.  I also think the pair of them are changing some of the rules we know for how open source projects emerge and move forward.

Recently I was working on building a Rails based Event Calendar for MHVLUG.  This gave me a chance to dig in on ical, which has fascinated me since a set of talks at YAPC a decade ago.  There were 2 ruby ical libraries out there (icalendar.rb and vpim.rb), neither did quite what I wanted, and both projects were more or less dormant (the mailing lists were lots of “is anyone alive?” posts).  Ug, I was stuck, and if I had to start from scratch on ical, that was all I’d end up doing, never getting to my application.

I googled some more… and low and behold found a github.com fork of icalendar.rb, and forks of that.  Those forks implemented about 50% of the fixes I needed to get ical generation with timezones to work.  So I forked from one of those and 6 changesets later, had what I needed.  I then built my application, and life was good. 

A few days later I decided to collect up all the changes in all the github icalendar trees, and merge them into my tree.  While git itself can be somewhat confusing, github adds this really slick web interface on top of git trees, that makes the merge process pretty painless.  This is one of their key innovations, and it’s just incredible.  I selected all the outstanding changes that would merge cleanly, pulled them in, and now had a tree which largely encompassed the 8 existing forks on github.com.  I posted back to the dead mailing list and let people know there was this now living github tree where the project had seemed dead.  I got a couple of new patches people wanted in, and 2 months later the maintainer actually showed up again and gave me admin access to the icalendar project so I could publish official versions.

This pattern repeated a few more times on the project.  I found a piece of code on github that did 90% of what I needed, but I needed a change.  I created my fork, added my feature, and pushed it back out (with a pull request).  A few days later the maintainer pulled them back in, and now they are officially part of the project.  I’m not vested in those projects, but I had relevant fixes, and because we were all using a tool that makes it easy to be a casual contributor, they are now part of the open source projects in the sky.

Casual Contributions

If you haven’t seen the paper on participation inequality, go and read it… now!  Previously most of the studies on open source community participation focussed on big projects like the Linux Kernel, or Apache.  That’s sort of like trying to understand patterns of home construction by looking at Frank Loyd Wright’s houses.  Those projects are outliers in how communities work.  This study did a much broader look at online communities and found the striking 1-9-90 pattern:

This is how communities work.  1% of the population does most of the work, 9% are casual contributors, and 90% are just consumers.  Your user base is a silent majority.  In an open source world the 1% are the core contributors, and possibly the heavy power users.  9% is the people that file a bug now and then, maybe a patch or two, everyone else is the people that just download your code and you never hear from them.  This patern more or less holds true for all volunteer efforts.

In open source we’ve got an issue, which is that getting code from the 9% is hard.  The 1% typically has access to a central source management repository, and can merge code fixes as soon as they see them.  The 9% has to follow a completely different process, posting patches to trackers or mailing lists, many of which get lost because there are a bunch more manual steps to pull them into the main tree.  If any process requires more effort by the 1%, it typically won’t happen, they are full up on time as it is.

And this is where git and github, start making things interesting.  While I run a number of open source efforts, I end up in the 9% all the time.  If you are now using git for your main tree the 9% and the 1% are now using the same tools, which allow seemless inclusion of code.  The merge algorithm on git is really wonderous.  I’ve had instances of massive renaming of files while trying to integrate external fixes in those files, and everything just worked.  It actually surprised the hell out of me.

The 9% just want to casually contribute something they aren’t signing up for a lifestyle.  Get my fix out there, if other people want it, great, if not, so be it.  The fact that integration is 2 mouse clicks and 10 seconds of effort makes the chance of capturing those changes much more likely.

Recovering the Brown Field

Ever look at sourceforge.net?  or any of the clones?  50% of those projects never got off the ground.  Another 40% have died out for other reasons, the contributors: had a family, started working for a company that doesn’t let them work in OSS, got bored withthe project, died, or became inactive for any number of other reasons.  When open source software exploded in 2000, there was a lot of greenfield.  Everyone was out there building new stuff that no one had done before.  But now we have a lot of brown field.  A lot of 1/2 planned, 1/2 finished pieces of code that have useful bits in them, but have been abandonned by their original creators.

Tools like git and github help you recover that brown field.  In the last couple of months I run into project after project that petered out in 2006, but has a bunch of good code.  That means they are about 2 critical bug fixes away from being useful on modern systems.  It’s really not much work, but in the old system , with the projects locked up in a forge with an SVN or CVS source management system, they were dead.  You had to start over.  With github you can import that tree and keep working.

It’s a new pattern on how the open source community is going to function, while it could be built on any distributed SCM, the fact that git has a really good svn 2 way bridge, and that github made itself “person oriented” vs. “project oriented” really make me believe that it’s creating a uniquely new pattern for both recovering the brown field of open source, and enabling the 9% to be much more effective with their output.

Software in the era of drive by contribution

Now that we’ve got a set of tools that really were designed for helping the 1% and the 9% work together, I think we’re going to see a whole new blosoming of open source software.  The rules of what it means to be a project contributor are changing, in really exciting ways.  Forking used to cheap, and merging expensive, which is why forking was considered an insult.  But with tools like git merging is cheap, so the offensiveness of forking goes away.  It opens up for more experimentation, and more complex contributions happening outside the 1% group.  All this increases the velocity of contribution, and thus the volume of open source software out there.

I really think distributed source control is changing a lot of assumptions for how software gets developed.  So if you haven’t yet dug into the space, do it.

ACM talk tonight on Open Source development

Monday, April 20th, 2009

I’ll be giving the Poughkeepsie chapter ACM meeting tonight on Open Source development.  Some recent experiences with github have got me thinking on some of the new patterns emerging out of Open Source development.  The talk tonight is a first attempt at trying to show the emergence of these patterns.  While I’m not sure I’ve got all the right art or slides for that, I’ve got some really good notes, so I expect this will be a very fun and lively session.

If you are in the Poughkeepsie area tonight (Monday April 20th), you should stop by.

In praise of github

Tuesday, December 30th, 2008

A few years ago I became sold on distributed source control.  Being able to do offline work, try out new ideas cheaply, and throw them away, all were great things.  I started with mercurial, but over the summer started using git.  A couple of things pushed me over the edge.

  • git appeared more modular, at the end of the day this wasn’t really true.  The lack of a libgit was actually very disappointing (especially after I had sworn there was one), as I’ve got a number of interesting ideas stalled behind that one.
  • the git-svn pluggin, which provides really good 2 way integration between svn and git trees.  I’ve stopped making anon svn clones, I now do a git-svn clone.  If I want to fix something locally, I can now version that fix.
  • github – free social hosting of git trees

Github helps you over the hump in publicly hosting git trees.  Honestly, the hump isn’t very high, but the documentation out there could be a bit more straight forward.  I’d been chugging along using github for all my random open source projects, some that are active, some which are stalled.  But the source code is out there for others to take a look at.  Github provides nice instructions for people to clone the work, and run with it.  It’s definitely a prettier interface.

Github really started to shine for me this past weekend though.  I was looking for ical generation code for ruby to replace an email tool that I wrote in perl for our MHVLUG monthly meeting emails.  There exists 2 ruby ical projects, vpim and icalendar, neither of which support timezones in the ical generation, and both with pretty inactive mailing lists.  Once it became clear that the problem was not solved, I decided to dig in and see if I could come up with something workable.

But once you go social, github really shines

There had been a post on the icalendar devel list a few months back that said he had fixed a couple of timezone issues and provided a github url.  I cloned that project, and realized that while it got closer to what I needed, it still didn’t quite do what I needed.  So I clicked the fork button.

I was now given my own fork of the icalendar source.  But more importantly, it also showed me all the other forks on github, which there were 5 others.  I made my fixes, pushed them back public, and then proceeded to start to accumulate up some of the other changes out there.  There is even a fork queue which shows all the outstanding changes in other forks out there, as well as odds on whether or not the patches will apply.

While you could figure all this out on your own with the command line, that kind of discovery and view is really a help and a timesaver.

And it’s even better if you are doing ruby

Github is written in ruby, though I’m not sure on the framework behind it.  As an added bonus to people hosting ruby code on the site, the team created a gem build service into github.  You add a specially formatted gem spec file to your github tree, and you’ll get a gem built on each checkin.  My 2 ruby libraries that are there now are configured to build gems, easy for all to install.

If you haven’t checked out git, or github, you should.  While I found the learning curve on git to be higher than I really wanted to deal with, the community is very active, and the number of things that support git now is quite high.  Rails generators even support git now, automatically source managing via git or svn if you ask them to.  Github popped out of no where in 2008, and I can’t wait to see where they are going to go in 2009.

gcolor2 – just the application I was looking for

Monday, October 6th, 2008

I was working on the MHVLUG wiki, and needed to find a good color of orange.  Typically I just launch gimp, and use the color wheel in there.  But I stopped this time, and did this instead:

apt-cache search color picker

which returned 3 results, including gcolor2.

First off, this is exactly the application I needed.  It launches fast, and just gives me a color wheel to pick colors.  But, it turns out it has something else that’s great.

You see that little eye dropper?  You can click it and then click any pixel on your computer, and it will give you the RGB color of that pixel.  It doesn’t need to be anything special, as it’s pulling directly from the xbuffer.  So handy.  I can’t believe I didn’t know this existed until now.

Learning to Love Mediawiki

Sunday, October 5th, 2008

Mediawiki is the engine that powers Wikipedia.  While that gives it lots of props, it is writen in php, which has historically had security issues.  Over time, I’ve gotten over my php alergy, mostly because Wordpress is just too damn good (and what runs this site).

Over the past year I’ve fallen into running 3 media wiki instances… and I’m impressed.  So, I made converting mhvlug.org from MoinMoin to Mediawiki as my Linux Fest project yesterday.  2 hours to port our theme, an hour to figure out how to export / import all our content, a couple of hours tracking down how to get short urls to work right, then an hour or two of deleting pages that probably shouldn’t have transfered in the first place, and we’re done!

I’m very happy with the new site, especially after adding plugins that do callendar and maps.  I’m hoping that over the next month we can migrate everything else and get rid of the old site entirely.

It’s also amazing how much more you use software you really love.  I’m definitely a fan of mediawiki at this point, and being able to use things I learn in one instance on other instances is really handy.

Hello Thunderbird

Monday, September 8th, 2008

Push finally came to shove, and I’ve now entered the 21st century by making Thunderbird my email client (I actually tried Evolution for a day, but after 20 crashes gave up. But that’s a different story.) Previously I was using mutt. There were a bunch of reasons to do this, though the biggest one for me was getting to turn off a box at home that was my IRC proxy, gateway to my home network, and ssh point for reading my email in mutt. That should save us at least a few hundred watts.

The New Configuration – Server Side

I’ve moved to using dovecot as my imap server. This has the advantage of being able to handle a home directory full of mboxes nicely (which courier could not). This means I can keep my perl based dynamic mail filtering working on the server until I manage to rewrite it as a thunderbird extension. I was using IMAPS before just as a secure POP, but now I’m actually taking full advantage of having imap remote folders.

My IRC proxy moved to my linode, which was probably a better place for it to be anyway. I even bothered to package it as a ppa for ubuntu, which means you can easily install as well.

Lastly, my gateway box is now a kvm guest running on my big home media / backup server. I was quite impressed by how nice virt-manager made the system install and setup from an ubuntu 8.04 iso. I had to do a little manual effort to configure bridge networking correctly, and deal with conflicting dnsmasq instances, but after that all was good.

The New Configuration – Client Side

I’ve now got thunderbird setup for 3 IMAP accounts (dague.net, gmail, and work), plus news groups (all work ones). This gives me a really nice consolodated view of my email. I was pretty impressed by how well thunderbird handles the 4 identies, and routes outbound correctly quite nicely. For dague.net and gmail email is filtered server side, I’ve client side filters for work because it’s sieve, and I really don’t want to learn another filtering specification language.

On top of that I’ve got a ton of extensions. I found that thunderbird out of the box was ok, but I lost a lot of mutt functionality. After a hunt through the extensions I got most, if not all of that back. For the record here are the extensions I currently have installed:

  • Attachment Reminder – this fires off a warning and prompt if you hit it’s heuristic rules of an email that might need an attachment but you don’t have any. I’ve seen the warning 4 times now, though they were all false positives. I do like the idea though, so I’ll keep it around.
  • Colored Diffs – brilliant if you are on mailing lists where patches are sent around
  • Display Mail User Agent – because I’m curious on who uses what. I always had this header visible in my mutt configs.
  • Display mailing list header – way more useful than I thought. It basically puts a set of links across the top of the email for Subscribe, Unsubscribe, Archive, etc. It makes it a lot easier to get off lists that you realize you don’t really care about any more.
  • Enigmail (from ubuntu package) – there was no way I was giving up pgp. It also has the advantage of making pgp policy setting much simpler.
  • Extension Developer – more on this later
  • Import Export Tools – because I had a lot of saved off mbox files that I needed to get back into thunderbird.
  • keyconfig – actually works on all mozilla base tools, but I needed it to redo a few key bindings
  • Lightning – this is the Sunbird callendar program as an embedded addon. It’s actually quite nice for callendaring and task lists.
  • Mnenhy – this gave me more control over mail headers. IIRC display mailing list header needs it to function.
  • Mutt Keys – my own extension, more on that in a bit
  • Nostalgy – gives you a set of nice key bindings and input field for save & copy of email. Very handy.
  • Provider for Google Calendar – a lightning plugin that lets you have good 2-way google calendar support. This is something evolution promissed, but it didn’t work. It works great on thunderbird with this extension.
  • Quote Colors – if people both to follow standard quoting models for email this does a really nice job of coloring the different posters to make it much easier to read.
  • Track Package – gives you highlight + right click to track packages based on emails. While it’s not everything I want, it is pretty useful.

But it could be better…

Thunderbird is now very useful to me, but I have found ways in which I could make the whole thing better. Mutt keys was a quick dive into making my own thunderbird extension that was nothing much more than key bindings (based on the now unmaintained mouseless extension). It’s rough, but it let me figure out some of the basic structure of writing thunderbird extensions.

Since then I installed extension developer, which has a great tab completable javascript shell, and have been exploring making an extension that lets me quickly make a calendar task out of an email. I have a bunch of ideas queued up behind this, but that is a short term useful one to dig into. I actually quite like the component interface model that thunderbird has, though I wish there were a few more API docs or examples to figure out what possibilities exist.

As I figure out more, I’m sure I’ll post it here. I have definitely found that developing thunderbird extensions is pretty tall grass, as very few folks have really written down much on it. I’m going to try to be a good citizen and stick stuff in the mozilla wiki as I figure it out.