Software in the era of drive by contribution

I love git.  I'll state that up front.  I also love github, which I've expressed in the past.  Both are making me look at software in a new way.  I also think the pair of them are changing some of the rules we know for how open source projects emerge and move forward.

Recently I was working on building a Rails based Event Calendar for MHVLUG.  This gave me a chance to dig in on ical, which has fascinated me since a set of talks at YAPC a decade ago.  There were 2 ruby ical libraries out there (icalendar.rb and vpim.rb), neither did quite what I wanted, and both projects were more or less dormant (the mailing lists were lots of "is anyone alive?" posts).  Ug, I was stuck, and if I had to start from scratch on ical, that was all I'd end up doing, never getting to my application.

I googled some more... and low and behold found a github.com fork of icalendar.rb, and forks of that.  Those forks implemented about 50% of the fixes I needed to get ical generation with timezones to work.  So I forked from one of those and 6 changesets later, had what I needed.  I then built my application, and life was good. 

A few days later I decided to collect up all the changes in all the github icalendar trees, and merge them into my tree.  While git itself can be somewhat confusing, github adds this really slick web interface on top of git trees, that makes the merge process pretty painless.  This is one of their key innovations, and it's just incredible.  I selected all the outstanding changes that would merge cleanly, pulled them in, and now had a tree which largely encompassed the 8 existing forks on github.com.  I posted back to the dead mailing list and let people know there was this now living github tree where the project had seemed dead.  I got a couple of new patches people wanted in, and 2 months later the maintainer actually showed up again and gave me admin access to the icalendar project so I could publish official versions.

This pattern repeated a few more times on the project.  I found a piece of code on github that did 90% of what I needed, but I needed a change.  I created my fork, added my feature, and pushed it back out (with a pull request).  A few days later the maintainer pulled them back in, and now they are officially part of the project.  I'm not vested in those projects, but I had relevant fixes, and because we were all using a tool that makes it easy to be a casual contributor, they are now part of the open source projects in the sky.

Casual Contributions

If you haven't seen the paper on participation inequality, go and read it... now!  Previously most of the studies on open source community participation focussed on big projects like the Linux Kernel, or Apache.  That's sort of like trying to understand patterns of home construction by looking at Frank Loyd Wright's houses.  Those projects are outliers in how communities work.  This study did a much broader look at online communities and found the striking 1-9-90 pattern:

This is how communities work.  1% of the population does most of the work, 9% are casual contributors, and 90% are just consumers.  Your user base is a silent majority.  In an open source world the 1% are the core contributors, and possibly the heavy power users.  9% is the people that file a bug now and then, maybe a patch or two, everyone else is the people that just download your code and you never hear from them.  This patern more or less holds true for all volunteer efforts.

In open source we've got an issue, which is that getting code from the 9% is hard.  The 1% typically has access to a central source management repository, and can merge code fixes as soon as they see them.  The 9% has to follow a completely different process, posting patches to trackers or mailing lists, many of which get lost because there are a bunch more manual steps to pull them into the main tree.  If any process requires more effort by the 1%, it typically won't happen, they are full up on time as it is.

And this is where git and github, start making things interesting.  While I run a number of open source efforts, I end up in the 9% all the time.  If you are now using git for your main tree the 9% and the 1% are now using the same tools, which allow seemless inclusion of code.  The merge algorithm on git is really wonderous.  I've had instances of massive renaming of files while trying to integrate external fixes in those files, and everything just worked.  It actually surprised the hell out of me.

The 9% just want to casually contribute something they aren't signing up for a lifestyle.  Get my fix out there, if other people want it, great, if not, so be it.  The fact that integration is 2 mouse clicks and 10 seconds of effort makes the chance of capturing those changes much more likely.

Recovering the Brown Field

Ever look at sourceforge.net?  or any of the clones?  50% of those projects never got off the ground.  Another 40% have died out for other reasons, the contributors: had a family, started working for a company that doesn't let them work in OSS, got bored withthe project, died, or became inactive for any number of other reasons.  When open source software exploded in 2000, there was a lot of greenfield.  Everyone was out there building new stuff that no one had done before.  But now we have a lot of brown field.  A lot of 1/2 planned, 1/2 finished pieces of code that have useful bits in them, but have been abandonned by their original creators.

Tools like git and github help you recover that brown field.  In the last couple of months I run into project after project that petered out in 2006, but has a bunch of good code.  That means they are about 2 critical bug fixes away from being useful on modern systems.  It's really not much work, but in the old system , with the projects locked up in a forge with an SVN or CVS source management system, they were dead.  You had to start over.  With github you can import that tree and keep working.

It's a new pattern on how the open source community is going to function, while it could be built on any distributed SCM, the fact that git has a really good svn 2 way bridge, and that github made itself "person oriented" vs. "project oriented" really make me believe that it's creating a uniquely new pattern for both recovering the brown field of open source, and enabling the 9% to be much more effective with their output.

Software in the era of drive by contribution

Now that we've got a set of tools that really were designed for helping the 1% and the 9% work together, I think we're going to see a whole new blosoming of open source software.  The rules of what it means to be a project contributor are changing, in really exciting ways.  Forking used to cheap, and merging expensive, which is why forking was considered an insult.  But with tools like git merging is cheap, so the offensiveness of forking goes away.  It opens up for more experimentation, and more complex contributions happening outside the 1% group.  All this increases the velocity of contribution, and thus the volume of open source software out there.

I really think distributed source control is changing a lot of assumptions for how software gets developed.  So if you haven't yet dug into the space, do it.

Things I learned this week

In no particular order, a quick run down of some things I learned this week (no particular order):

Ruby / Ruby on Rails

  • In Ruby: don't use f.readlines.each in a loop, as that waits for all output, then iterates.  Use f.readline instead, but be prepared to catch the EOF exception when you finish (it's a documented part of that interface)
  • In Ruby on Rails: rss is a valid format (at least in 2.2.2), and can be used in a builder

Mono / C#

  • In Mono: File.Exists fails on directories.  Even though directories are just special files in Linux, the implementation decided that wasn't right.  Use Directory.Exists instead.

Networking

  • IPv6 finally made sense to me, after implementing a 3 site topology for my Network Lab graduate class.

Ubuntu Jaunty Roundup

I've now migrated my work laptop to Ubuntu 9.04 (Jaunty), which went pretty smoothly.  I played some games to use internal mirrors, but still use the graphical update process (instead of just dist-upgrade), which all worked out well.

New in Jaunty

One of the bigger items that got press for Jaunty was their new notification system.  I really does rock.  It looks slick, and is very consistent, and I'm a fan.  I'm also a fan of the new splash screen.  All these bits are cosmetic, but something that looks beautiful is important in using a computing environment.

Bugs Fixed

I've had a number of bugs that I used to have to work around, now they work correctly:

  • there used to be a race in bringing up superswitcher when gnome started that meant it didn't get to lock out the caps lock key.  So I had to stop and restart it after a fresh login.  That appears fixed.
  • Jaunty now understands the right suspend settings for my nvidia card, no need to adjust that in the acpi hal configs any more.
  • emacs-snapshot is now current enough that it loads my configs perfectly.  For the first time in 10 years I'm now running a prebuilt version of emacs/xemacs for daily development.  /usr/local just got a bit smaller for me.

Dear Amarok... why do you suck now?

The Amarok team took their application off a cliff with version 2.0 (which is now what's in Jaunty).  All support for syncing devices is gone.  While some aspects of their UI is neat, including podcast search, I'm really not interested in going back to rsync for device management.  It's also really unclear that is ever coming back.  Fortunately, banshee seems to have gotten pretty good, so that's where I'm at now.

Update notifier, where did you go?

Update manager doesn't display the orange star for daily updates any more.  There is a workaround listed in the bug, and a lot of this is wrapped up in the philosophy of the new notification system.  However, I really liked my daily updates.  I get that the team was trying to get stuff out of the notification tray but this seems to be throwing the baby out with the bathwater.

Final Thoughts

It's really nice to see Canonical push Linux into something that is beautiful, consistant, and flexible.  I find myself tweaking my volume settings just to get the nice notifications. 🙂

ACM talk tonight on Open Source development

I'll be giving the Poughkeepsie chapter ACM meeting tonight on Open Source development.  Some recent experiences with github have got me thinking on some of the new patterns emerging out of Open Source development.  The talk tonight is a first attempt at trying to show the emergence of these patterns.  While I'm not sure I've got all the right art or slides for that, I've got some really good notes, so I expect this will be a very fun and lively session.

If you are in the Poughkeepsie area tonight (Monday April 20th), you should stop by.

What's with all this Java complaining about AppEngine

I've seen all manner of people in the twitter verse complaining that Google's AppEngine Java support is a subset of Java, and how that "breaks a decade of compatibility".

Seriously?

I mean, really, seriously?!?

I've got to have 3 JVMs installed on my system to use ~ 5 java applications in total.  So I'm not buying the compatibility complaint, as "best practice" in the java world is to ship your own copy of the vm. 

And I definitely sympathize with the Google folks that really don't want to be running millions of idle 2 GB memory footprint VMs.  It is basically free after all, so what's up with all the complaining.  And, honestly, if it gets Java folks rethinking if they really need 5000 classes floating around at all times, I think that's doing the world a favor. 🙂

1 thing you don't know about me

Much like other facebook meme's I passed on the whole 25 things cycle around.  But here is 1 thing you probably didn't know about me: for grades 1 - 5 I attended a one room school house, and had the same teacher for 5 years.

The one room school house can be thought of as a historical throw back.  Prior to the invention of the automobile, you had to walk to school.  That meant that schools needed to be within the daily walking distance of a 6 year old, so the concept of the one room school was born.  A single teacher for a village, and a different school for each village.  With the introduction of the bus in the 1920s, most of these were wiped out in the face of progress.

But there were holdouts, typically in small rural towns.  I happened to grow up in one of these towns.  When I was in first grade there were 18 kids in the school, a single room, a single teach, and 6 grades.  That averaged 3 students per grade, but at this small of a sample size a grade might be 6 or even just 1 student.  Lessons were run in the front of the room, and students would then go back to their desk and work on some assigned tasks.  The older students were each buddied up with 1st and 2nd graders and helped them with reading assignments.  Recess was the same for all, and with that few individuals there was no room for cliques to spawn.  We were all there together.  Grades became a bit more fluid, at that level of individual attention you could be challenged individually based on your aptitude.  By the end of 2nd grade I'd started in on a 4th grade math book, but was with the rest of the 2nd grade class on other subjects.

By the time I got to the end of 5th grade, my teacher, Eula Bannister, who had been teaching in that school, in that way, for decades, also retired.  The school had shrunk to 5 grades at that point (population was going up, so 6th went to a neighbor town), and it was to be the last year of grade 5.  To me, leaving that system, there was some perfect symetry to that.  I owe a lot of who I am to that school, and that experience.  A big part of my personal drive came from a set of values that Eula inspired in me, over the course of 5 years.

Last week, during the annual school meeting, the town decided to close the doors on the Granville one room school house (I could wax eloquently on the fact that it was done by direct in person democracy, another value that comes out of rural vermont, but that's probably for another post).  It was a hard decision for everyone, and a decision that was many years in the making.  There are so many challenges to keeping a school like that functioning, and correctly serving the students.  No matter how romantic the idea, the important thing is that students are being best served.  One of the huge challenges is finding a teacher with the range to handle that task, the energy to maintain, and the willingness to take the pay a small rural town can afford.  In this day and age, there probably isn't a place for a school like that.

I feel special to have had this experience, knowing that much like the passenger pidgeon, and the mill wheel, it's a thing of the past.  Granville's school has done an incredible service over 158 years of operation.  I'm glad that I had an opportunity to spend 5 of those years with it.

OpenSim Infrastructure Updates: fresh os, git mirror, and automated release building

Yesterday I upgraded the opensimulator.org machine (kindly provided by Adam Frisby) to the latest version of Debian.  The upgrade went seemlessly.  Now that we are on Debian 5.0 we've got some fresher software on the machine to make it possible to provide a few new things as part of the basic OpenSim infrastructure.

OpenSim via Git

We are now mirroring the experimental upstream code (aka subversion trunk) via git.  At least 5 of us on the OpenSim core team have been using git personally with the git-svn bridge for our own OpenSim work (I started doing this nearly a year ago).  Git provides some advantages in making it easy to try things out in a local tree, and throw away branches if things go wrong.  If you read my blog, you know, I love git. 🙂

While subversion remains our main tree, this git mirror will make it easy for developers (or budding developers) to experiment with this alternative source system.  You can use viewgit to see the git mirror, or clone this via:

git clone http://opensimulator.org/git/opensim

In addition, the viewgit system provides a very handy rss feed for changes, which is another way you can keep up to tabs on what's changing in trunk.  There is an up to 10 minute lag in changes getting into the git mirror from svn, but hopefully that won't bother anyone.

Automated Release Building for OpenSim

Something else I threw together last night was an automated release builder for OpenSim.  One of the challenges we had was getting all the parts of the release sorted out once a release tag was made was sometimes onerous, and meant that a release might only be an subversion tag for days or even weeks before source tarballs of that saw their way into the world.

I've now got a system in place that looks for all numeric tags in our source tree, checks them out, runs prebuild on them, and bundles them up as both a .zip and a .tar.gz.  This means they should be ready to compile with nant or MSVS.  This is running hourly on the OpenSim machine, and publishing all results to http://dist.opensimulator.org.  One of the immediate things you'll see is that it now gives us a full set of historically populated releases.

I'm hoping you enjoy these extra bits of infrastructure for the project.  Please feel free to drop me a comment here if you have any thoughts or questions on them, feedback is always appreciated.