I became one of the co-maintainers of the epublish drupal module this year, as we are an active user of it at the Poughkeepsie Farm Project, where we use it for our newsletter. I had about 6 big fixes to it that we needed, and so I was eventually just added to the project so I could commit them myself. Since then I’ve tried to clear out some of the backlog of issues, and added a few features.
In doing so I realized how fragile some pieces of the code were, as I broke things pretty regularly. Fragile code is the biggest inhibitor to innovation, because no one wants to try anything exciting and new when just keeping things working after simple patches is so exciting (and not in a good way).
So I decided to take on the project of revitalizing the code, in a couple of steps.
Add Automated Tests
Before actually making any substantial changes, I wanted to actually nail down current behavior, so that I know if I’m going to break anything. Over the past three weeks I’ve been figuring out how SimpleTest works, and writing tests for the code. I’m now up to 798 assertions, and have covered a lot of basic workflow, creating / deleting / updating publications and editions. I’d say I’ve probably got 1/3 of the coverage I should have, but I’ve ensured I’ve covered the goofy bits around article ordering that I broke multiple times in the last six months. I consider this a pretty good start.
As with all testing, the hardest part is the first test. That takes about eight hours to understand the system, and build and run a test that makes sense. I did that on a rainy Saturday before vacation. After that adding tests is easily done in 30 minute increments, which I puttered on in the mornings before Susan woke up during our vacation. In the process of getting these tests to work, I found and fixed a few critical bugs as well (when things didn’t work like they really should). Epublish 1.13 reflects these changes.
Extract the Database Layer
Step 2, which I’m starting on now, is to remove the database layer from the core epublish.module. My goal is to have no SQL statements in the core module file, and instead have those all in an epublish_api.inc file. They can be replaced with functions like “epublish_list_publications” and “epublish_add_publication($pub)”.
Again, this can be done in small increments, mornings while sipping coffee, or after work. I’m mostly looking for the kinds of SQL statements currently used, and trying to come up with the sane and well understood function name that they would map to.
Additional possible refactoring
One thing I found in doing the lending module is that it’s actually pretty nice to isolate the 3 form functions (display, validate, submit) for a specific form into it’s own include file. This makes it easy to know you are looking at everything about a transaction in one place. I’m going to see if this makes sense after the database pull out.
Theming functions also make sense to have in a separate place, they are logically quite different than other code in a drupal module, as they are the one place you’ll get markup included in the file.
All of this is really about adding a new and better interface for managing the articles in an edition, as the current interface leaves a lot to be desired, especially because of the automatically imported calendar entries we’ve got on the PFP site. It will also mean that a port to Drupal 7 can really happen. There are lots of other things holding the PFP site back on Drupal 6, but this is one of the bigger ones.
When you are a young and eager developer you always want to start from scratch on projects. The assumption is you can write a replacement in less time than it takes to figure out what the existing code does.
I’m now of the opinion that a whole sale rewrite is almost always the wrong thing to do. It means a very long period of time where you have nothing working, which means there is a good chance you give up on the project all together, with a lot of wasted effort.
This kind of incremental refactoring, adding tests along the way, lets you make pieces of the code better, and continue regular releases on existing function (even finding some bug fixes as you go). If this becomes uninteresting to me at any point, I’ll still have made the code easier to read, created a solid battery of tests, and handled edge conditions much better.
And this is how the majority of real software projects work. You aren’t there at the beginning, you aren’t there at the end. Your impact is really whether you can leave things better for the guy or gal that comes after you.
One thought on “Revitalizing a software project”
My favorite type of work is taking a legacy system that has become a mess and making it streamlined, structured and transparent, fixing bugs and gaining consistency in behavior. Over a period of time you end up with a new system, but you always have a working one to guide your work. It’s like doing a complete organ transplant while keeping the patient alive.