There is a great article done by a member of the team that did the New York Times Netflix infographic. I especially love the fact that they wrote a scraper in ruby to pull in some of the data they needed off of google search results.
Over the past couple of weeks I redid the MHVLUG site as a Drupal site. Drupal is a content management system, which is a fancy way of saying it’s a website, that lets you modify most of it’s parts via a web interface, and contains semi structured data. I wrote a bit about it in the past.
I got motivated to redo the MHVLUG website after working on the farmproject website redesign, which also uses Drupal. The MHVLUG website had had 3 previous iterations: static html stored in CVS, MoinMoin wiki, and Mediawiki. The reasons for each previous switch are beyond the scope here, but each time I felt we took a step forward.
While the wiki approach worked ok for the LUG, the biggest set of edits on the wiki was around monthly meetings. Before each meeting I’d need to move the meeting content into the front page. After each meeting someone would need to copy and paste that into it’s own page (sometimes this got lost). Meetings would get presentations added after the fact, but because it was a wiki, content and presentation were all wrapped up together. Having the meeting data stored separate from presentation, and being able to create different slices of it for the site (next meeting on frontpage, full meeting pages for the archives, lists of past meetings, lists of future events, in calendars) was enough to get me over the hump.
The basic Drupal environment is a lot more bare bones than you would imagine. It would make an ok blog out of the box, but that’s about it. Before you get started with any real Drupal project you’ll need at least the following addon modules:
- cck – content contruction kit, this lets you define custom types with custom fields
- views2 – this is a basic query builder that lets you create custom slices of data to display as pages, blocks, or in a number of other ways.
- devel – this gives you a really handy set of add on functions for debugging your types and views
- admin menu – this will save you a lot of time
I’m giving you my wisdom in hindsight here. I didn’t have devel or admin menu during the first bits of launching the new mhvlug.org… and man would they have saved me a bunch of time.
Meetings and Events
Beyond basic pages and stories (aka news), the mhvlug site has 2 main special types of data: Meetings and Events. You could coax the 2 into 1 type, which simplifies some things, but meetings have enough extra bits of data (uploaded presentations, presenter info) that I chose to not do it that way. Meetings and events are both a collection of a place, a time, and content. After installing the date module, I had the functionality that I needed on the time front. I added some custom fields for presenter and presentation on the meeting front, and with a not too complicated set of views had it so the next meeting showed on the front page automatically. When Jan 7 rolls around, no one will have to go adjust the front page any more. At this early stage I already had the win I was looking for.
While I had the time and content portions worked out pretty quickly for our Meetings, location was a bit more challenging. We only have about 6 locations that we ever use for MHVLUG events or meetings, so I didn’t want to add addresses manually to each piece of content. Fortunately with cck Drupal has the concept of a node link, which lets you build a relationship to another piece of content. So now I had a 3rd custom type, Location.
Given that this is 2009, the minute you have an address, you want to have a google map to go along with it. I spent a couple of days trying all manner of various google map modules before I threw up my hands on this one. Every single module I experimented with didn’t do quite what I wanted, and needed an aweful lot of configuration.
Eventually, I went for the simple way out. I just made a text field that I could embedded a google my map chunk of iframe code. When the field is printed out, you get the map. This turns out to be particularly useful as some of our locations don’t really have geocodable addresses. So this challenges was overcome without any new modules.
Custom Display of Meetings
Although you can get a certain amount of customization out cck for custom node types, I found I couldn’t really get what I wanted when it came to the display for meetings. Fortunately Drupal provides a mechanism for dealing with this, which is custom templates per node type. By default the core content of each page is rendered using the node.tpl.php in your theme. If you create note-meeting.tpl.php, it will use that to render meeting types instead. This works for any custom node type.
It occurred to me that if someone was looking at a meeting or event in that was coming up, they’d immediately want to know where it was. I wanted to do more than just print the name of the location, I wanted to pull in and display the map as well. This was where I learned a few really important tricks.
dprint_r is the first one. If you have the devel module enabled, you get access to some functions that you can use to help debug what’s going on in drupal. dprint_r spits out a nicely formated version of the data structure you pass it in html. You can thus use it in your templates to see whats going on. As someone that thinks in data, this was critical to getting my head wrapped around what drupal was actually doing.
When you load an event page, drupal loads all the data for the event object you are accessing, and it loads the id, tile, and url for any referenced objects, in my case location. To get more you use the node_load function, which loads any arbitrary object in drupal by it’s node id. This let me pull in the whole location object and embed the map on meetings pages. node_load has performance implications, so don’t use it everywhere all the time, but in this case it turned out to be cheap and powerful.
The php templates are just php code, so you can get even trickier. Meeting location is only interesting prior to the meeting, so I adjusted the template so it only displayed when the meeting/event date was in the future. Then the archive isn’t polluted with maps. Works great once you figure out the chaining of date, time, datime objects you need. Plus, make sure to get your timezone right!
I went live with basically the functionality above, but I realized that if I could get drupal to spit back out ical, I could get rid of my parallel calendar site I was maintaining. There is a lot of documentation on how to do this with the calendar module… it’s all very confusing. Once you have both date and calendar installed you’ll have the option in the administration panel to use the Date Wizard to create your calendar. Do it! It creates 8 linked views that give you a calendar at all levels (year, month, week, day), upcoming lists, an ical feed. If you don’t like the date field it’s creating, just get rid of it after the fact. Building those views by hand is just going to be a pain, and it’s much easier to tweak them after the fact.
I finally hit a point here where I needed to dive into code, because the ical specific portions of drupal are lacking. Here are the bugs I found, fixed, and submitted patches upstream for.
- iCal feeds do not pass ical validation
- Only 1 item with RRULE allowed in an ical export
- tweaks for date_ical_escape_text to make it closer to ical spec conformance
In english, Drupal isn’t doing the required wrapping for multiline fields like it should in the spec. ical wrapping is odd, so I understand why people haven’t fully implemented it yet. You wrap at > 60 characters, and not on wordbreaks. That’s right, cutting up a word in the middle is part of the spec, and things actually work better when you do that. New lines need to start with “\r\n “, which is really important. The other issue is that no one tested the recurrence rules much in drupal. It turns out that in default drupal you can only have 1 event that follows a recurrence rule, i.e. every 2nd Tuesday of the month. The processor bails before it gets to the second rule.
I fixed all this locally, built the patches, and sent them upstream. This is the only time I had to dive into the code and fix things. While it would have been great to not even have to dive in here, I’ve yet to pick up an ical base that I didn’t need to go tweek yet. I had the same amount of work on the ruby icalendar stack when I played with it. iCal is just a weird specification, that doesn’t look like what you’d expect. This is what you get when Lotus and Microsoft build an interchange protocol in the late 90s.
Why iCal? We found that 50% of our users are using Google Calendar now. Export to iCal is required for anything that is time based, as Google Calendar is the defacto client for this (though Lightning in Thunderbird does a quite nice job as well).
Mailing List integration
MHVLUG has a mailman mailing list where most of our communication takes place. Previously you needed to have an account on the website to edit, and a different account on the mailing list. Through the user mailman manager you can let people easily subscribe and manage their list subscriptions. This works really nicely, and has already gotten a few people on our mailing list that didn’t ever join before.
Fighting the Spam Bots
The moment you open up registration to the web, you get spam bots trying to login in, and post Chinese drug company links on your website. It is the price we pay for such an open medium. Spam bots nearly killed us under the moin wiki. Media wiki faired better, but I still needed to go and revert a couple of pages every couple of months. I found that in the first 2 days of drupal being up we had 3 bots signing up, never a good sign.
So here is the formula I’m using for drupal that seems to be quite effective:
- Require that users confirm their registration via email activation. This is the default for drupal, and helps quite a bit.
- Adding Captcha and Recaptcha modules to prevent bots from bothering you with partial registrations in the first place.
- And finally, use LoginToboggin to put non confirmed users into a penalty box group, which it will automatically purge after 30 days.
While the first 2 actions will protect your site, you’ll still pile up plenty of partial registrations, which just clutters up user management. Having the system auto purge nonconfirmed accounts makes it all the more self tending.
Better URLs and URL migrations
Out of the box drupal has this totally ugly node/# model for urls. While it is valid, it really sucks to look at, and google often penalizes you for that as well. While there is some support for pretty urls out of the box, you really want to install path_auto right off the bat, which builds the url from the title of the node.
Specific to this migration was that we had over a hundred pages in the old media wiki (mostly meetings). Which means people are going to have linked to something in the past, and not find it in the new site. There was no way I was going to url map every old url to the new ones. There is a interesting partial solution which seems to be work well, the search404 module. When an unknown url comes in it breaks up the url path into words and runs it through the search engine. If it’s 1 hit, it just takes you there. If it’s more or less, it leaves you at the search404 page with the search results provided. It’s not perfect, but I’m hoping it eases the transition (though I’m still getting hits from search for urls from the moin wiki, so some amount of that isn’t going to go away any time soon.)
Sending out Announcements
This is one place where I didn’t find anything in Drupal out of the box that did what I wanted, which is to take the meeting or event text, wrap it in a template with standard boiler plate, and email it to the mailing list. I tried a few things, like simplenews, which turned out to be anything but. I wasn’t at the point where I yet want to build a module from scratch (though I’ll probably get there at some point), so I did the next best thing, and hacked the crap out of it do what I want. The module I hacked up with the print module.
Print provides printer friendly pages, as well as send by email functionality. I just exposed the print function to the users, but send by email I kept as admin only, and gutted it so that for meeting node types it built me exactly the kind of boiler plate I wanted. Using mimemail it sends it in both html and text, and looks pretty good. It is a hack, but one I’m willing to live with for now.
Making Editing Pleasant
Pulling it all together
With the modules and configuration I’ve layed out here, I now have a quite good community site that supports events, calendars, users that edit pages, users able to manage their mailing lists subscriptions, and a front page that is always going to show the next meeting. And beyond that, it’s just pretty. I’ve also started playing with things like twitter integration, and sending email to the list on new news stories (which I’m manually doing now).
I learned a couple of lessons on Drupal in the process. First, don’t be afraid of modules. Drupal modules, especially the good ones, do a small amount of specific function. To have a robust site that does what you want is going to take adding quite a few. Building this up from scratch is what Drupal often gets dinged for compared to Joomla, but I actually like it better this way, as it provides more flexibility.
Second, sometimes there is no module to do what you want. In that case you have 2 options, see if you can do it simpler (like I did on the map field), or see if you can hack up the function you need on a related module (like I did with the email announcement). Both work, depending on what you are going for.
Lastly, the keystone to any site is really Views, CCK, and how you create your custom node types. Think about this one the most. What is a meeting? What is an event? You can always modify them later, but consider figuring out what your custom types are to be your first and most important mission when building a site.
Internet Relay Chat (IRC) has been around for just about ever. In the 90s it was used for chat rooms and warez sites mostly. For the past decade it’s become one of the key pillars of communication for the Open Source Community. It has the advantage that the client and server are free and open, and there is an inherent redundancy system built in.
One of the challenges of IRC, over say email, is that you need to be online to see the discussion. On a really global project this is a problem, because of the pesky fact that daytime is determined by facing the Sun, and living on an orb, only 1/2 the earth gets to do that at a time. Life would be so much simpler if the flat earthers were actually right. But there is a way to stay connected, even when you are not, which is using an IRC proxy.
Using an IRC Proxy
The get started you will need a Linux machine, that you can run code on, that is always on. It doesn’t need to be on your network, but you need shell access. If you are a Linux geek level 4 or higher, this is probably not an issue. You probably either have a Linode or a home server that’s always on. If not… well sorry… your journey ends here. There is not, as of yet, a cloud service to provide this for you. Please come back once you level up.
The next step is the actual IRC proxy. IRC is a simple enough protocol, which goes over clear text, that many people have written a man in the middle server for it. You connect the proxy to the IRC server as you, then you point your IRC client at the proxy. When you are connected to the proxy, everything works as normal. Your messages are sent back and forth in real time. When you disconnect from the proxy, the proxy keeps you logged into irc and logs everything that goes on. The moment you reconnect to the proxy all those messages are replayed to your client. You now have a full offline ability.
My favorite IRC Proxy
There are many out there. A few years back I spent some time trying to get one that I didn’t hate, and I landed on miau. I’ve even packaged it for ubuntu, so if you are on that platform, it should be easy to install. Once installed, read the same miaurc on configuration, it’s really well documented, and should be easy enough to get rolling.
Although miau supports a password to connect to it, I don’t really trust running another service connected to the internet that just has a password in clear text. My solution here is to have miau only listen to localhost (127.0.0.1), and ssh proxy to the machine. Pick a port (like 4098) on your local machine and have that forwarded whenever you connect to that server. In linux this would look like the following in you .ssh/config.
LocalForward 4098 localhost:4098
The have your IRC client (like XChat) connect to localhost:4098. This will mean that you will only be connect to IRC when you have an ssh link to your proxy server. It works quite well, and is about as secure as you’ll get.
If you made it this far, you probably already know why. When development conversations happen at 4am your time on IRC, you are probably never going to participate directly. But, having access to the conversation when you connect in the morning is a very good thing, and I’ve walked people through this setup enough times in the past, writing it down for posterity seemed like a good thing.