Tag Archives: github

The challenge of upstream in Drupal

I’ve been using Drupal for a number of community websites for the past year.  Overall I quite like the system and how customizable it is.

What I’m not really thrilled by is the way the module structure exists on drupal.org.  The problem is that drupal core is a very small number of modules.  The bulk of useful functionality comes from user contributed modules.  So far so good, many projects run the same way.

The real issue is that the drupal project maintains a centralized CVS infrastructure for module developers.  There is a pretty formal process to get accepted as a module author, and no real way to get CVS access to just fix bugs.  The net result is that after finding 3 bugs in the date and callendaring modules, creating patches for each on my own environment, and submitting them upstream, only 1 has gotten looked at.  The fixes were submitted about 8 months ago, and I’ve pinged a few times.  I also wrote the author directly on one of the modules.

The root cause of this issue is that drupal has centralized infrastructure, but decentralized decision making.  It is a classic community wherein github would be a good solution.

I’ve been an absentee maintainer before, I know that sometimes interests change, and you move on to other things.  But at least with something like github, other folks can very easily extend what you’ve got and make the fixes themselves.  Hunting out RCS style patches in drupal discussion forums is just really problematic.  At this point I’ve got quite a number of local fixes, so module upgrade and maintenance is tricky.  The only saving grace is that most of those fixes are on modules that are effectively abandoned so I shouldn’t have to worry about an update coming down the pipe.

The Drupal community is soldiering on even with bad developer infrastructure.  I just wonder how much further along they’d be with better developer infrastructure.

Software in the era of drive by contribution

I love git.  I’ll state that up front.  I also love github, which I’ve expressed in the past.  Both are making me look at software in a new way.  I also think the pair of them are changing some of the rules we know for how open source projects emerge and move forward.

Recently I was working on building a Rails based Event Calendar for MHVLUG.  This gave me a chance to dig in on ical, which has fascinated me since a set of talks at YAPC a decade ago.  There were 2 ruby ical libraries out there (icalendar.rb and vpim.rb), neither did quite what I wanted, and both projects were more or less dormant (the mailing lists were lots of “is anyone alive?” posts).  Ug, I was stuck, and if I had to start from scratch on ical, that was all I’d end up doing, never getting to my application.

I googled some more… and low and behold found a github.com fork of icalendar.rb, and forks of that.  Those forks implemented about 50% of the fixes I needed to get ical generation with timezones to work.  So I forked from one of those and 6 changesets later, had what I needed.  I then built my application, and life was good. 

A few days later I decided to collect up all the changes in all the github icalendar trees, and merge them into my tree.  While git itself can be somewhat confusing, github adds this really slick web interface on top of git trees, that makes the merge process pretty painless.  This is one of their key innovations, and it’s just incredible.  I selected all the outstanding changes that would merge cleanly, pulled them in, and now had a tree which largely encompassed the 8 existing forks on github.com.  I posted back to the dead mailing list and let people know there was this now living github tree where the project had seemed dead.  I got a couple of new patches people wanted in, and 2 months later the maintainer actually showed up again and gave me admin access to the icalendar project so I could publish official versions.

This pattern repeated a few more times on the project.  I found a piece of code on github that did 90% of what I needed, but I needed a change.  I created my fork, added my feature, and pushed it back out (with a pull request).  A few days later the maintainer pulled them back in, and now they are officially part of the project.  I’m not vested in those projects, but I had relevant fixes, and because we were all using a tool that makes it easy to be a casual contributor, they are now part of the open source projects in the sky.

Casual Contributions

If you haven’t seen the paper on participation inequality, go and read it… now!  Previously most of the studies on open source community participation focussed on big projects like the Linux Kernel, or Apache.  That’s sort of like trying to understand patterns of home construction by looking at Frank Loyd Wright’s houses.  Those projects are outliers in how communities work.  This study did a much broader look at online communities and found the striking 1-9-90 pattern:

This is how communities work.  1% of the population does most of the work, 9% are casual contributors, and 90% are just consumers.  Your user base is a silent majority.  In an open source world the 1% are the core contributors, and possibly the heavy power users.  9% is the people that file a bug now and then, maybe a patch or two, everyone else is the people that just download your code and you never hear from them.  This patern more or less holds true for all volunteer efforts.

In open source we’ve got an issue, which is that getting code from the 9% is hard.  The 1% typically has access to a central source management repository, and can merge code fixes as soon as they see them.  The 9% has to follow a completely different process, posting patches to trackers or mailing lists, many of which get lost because there are a bunch more manual steps to pull them into the main tree.  If any process requires more effort by the 1%, it typically won’t happen, they are full up on time as it is.

And this is where git and github, start making things interesting.  While I run a number of open source efforts, I end up in the 9% all the time.  If you are now using git for your main tree the 9% and the 1% are now using the same tools, which allow seemless inclusion of code.  The merge algorithm on git is really wonderous.  I’ve had instances of massive renaming of files while trying to integrate external fixes in those files, and everything just worked.  It actually surprised the hell out of me.

The 9% just want to casually contribute something they aren’t signing up for a lifestyle.  Get my fix out there, if other people want it, great, if not, so be it.  The fact that integration is 2 mouse clicks and 10 seconds of effort makes the chance of capturing those changes much more likely.

Recovering the Brown Field

Ever look at sourceforge.net?  or any of the clones?  50% of those projects never got off the ground.  Another 40% have died out for other reasons, the contributors: had a family, started working for a company that doesn’t let them work in OSS, got bored withthe project, died, or became inactive for any number of other reasons.  When open source software exploded in 2000, there was a lot of greenfield.  Everyone was out there building new stuff that no one had done before.  But now we have a lot of brown field.  A lot of 1/2 planned, 1/2 finished pieces of code that have useful bits in them, but have been abandonned by their original creators.

Tools like git and github help you recover that brown field.  In the last couple of months I run into project after project that petered out in 2006, but has a bunch of good code.  That means they are about 2 critical bug fixes away from being useful on modern systems.  It’s really not much work, but in the old system , with the projects locked up in a forge with an SVN or CVS source management system, they were dead.  You had to start over.  With github you can import that tree and keep working.

It’s a new pattern on how the open source community is going to function, while it could be built on any distributed SCM, the fact that git has a really good svn 2 way bridge, and that github made itself “person oriented” vs. “project oriented” really make me believe that it’s creating a uniquely new pattern for both recovering the brown field of open source, and enabling the 9% to be much more effective with their output.

Software in the era of drive by contribution

Now that we’ve got a set of tools that really were designed for helping the 1% and the 9% work together, I think we’re going to see a whole new blosoming of open source software.  The rules of what it means to be a project contributor are changing, in really exciting ways.  Forking used to cheap, and merging expensive, which is why forking was considered an insult.  But with tools like git merging is cheap, so the offensiveness of forking goes away.  It opens up for more experimentation, and more complex contributions happening outside the 1% group.  All this increases the velocity of contribution, and thus the volume of open source software out there.

I really think distributed source control is changing a lot of assumptions for how software gets developed.  So if you haven’t yet dug into the space, do it.

Fixing Github with Greasemonkey

Github is great, I really can’t say enough good things about them.  What’s not so great is their css, as it breaks badly if you change you dpi.  I finally found the root cause of this which is the search field gets squeezed vertically, and collides with the div under it, forcing that to the right, and forcing the right sidebar down.  I sent in some feedback, but am not sure if/when it might get applied.

So, in the short term I added the following greasemonkey script for github.com

function addGlobalStyle(css) {
    var head, style;
    head = document.getElementsByTagName(‘head’)[0];
    if (!head) { return; }
    style = document.createElement(‘style’);
    style.type = ‘text/css’;
    style.innerHTML = css;
    head.appendChild(style);
}

addGlobalStyle(‘#header .topsearch {width: 25em}’);

And voila, the site is readable again.