Archive for the 'hobbies' Category

Ruby gymnastics

Sunday, November 4th, 2007

@strips = @user.comics.collect {|c|    
    c.strips.find(:all, :conditions => ["date > ?", 7.days.ago])}.
    flatten.   
    delete_if {|c| c == nil}.   
    sort_by {|c| c.date}

The previous shows ruby in much of it’s glory, and is code that I needed this weekend for one of my side hacking projects.  I’ve colored it to match what xemacs shows me, just to make it a little more clear.

First off, it shows off the power of mixins. 7.days.ago does exactly what you would expect, providing you with a date object.

Second, it shows off the power of collect (aka map in many other languages).  Collect lets you iterate through a list you have, and return a new collection based on an arbitrary transform.  In this case returning a list of strip objects for each comic.

And lastly, it shows the fact that collection operations can be chained.  My list of lists becomes a single order list, I purge out nils (probably redundant at this point), and then sort all the objects by their date field.

Ruby is such a fun language to program in. :)

Popularity: 15% [?]

A semester of search

Wednesday, October 17th, 2007

My grad school class this semester is the Project Course, where the whole semester is spent on a group project.  No tests, no other grading besides the project, which is actually what I expected more of when starting grad work at Marist (that’s a different post though).  The project domain is search.

We have to build an application with an integrated search engine tuned for a specific problem.  Our group problem is an user driven online restaurant review site.  Our canonical example is searching for “boston seafood”, which should return all the posts that a human would, given the same tasks.  That means “the best lobster in bean town” counts as a hit that you’d want.  Guess what, SQL like clauses and regex’s aren’t going to cut it here.

But that’s ok, we don’t have to do everything from scratch.  We’re expected to base our solution on Lucene, which is a search SDK.  You build custom indexer, analyzer, and searcher classes from the Lucene base classes, and feed it documents.  Lucene does the heavily lifting of building the inverted index, and scoring the results based on the rules, weights, and policies you’ve given it.  A project like this is pretty open ended, as you can always make it better given more time, and more interesting analysis tricks.

The whole team is making nice progress, so for the last two weeks I’ve been able to focus squarely on Lucene integration code itself.  Pass one got some basic queries working in Lucene.  Pass two was earlier this week, when scoring started to be useful.  Pass three will be tonight, where I’ll start to integrate synonym support so that lobster is understood as a type of seafood, burgers are understood to be american food.  Though I’ll have to think about how to make sure crab cakes don’t show up in the desert category, though maybe we just need a hybrid seafood desert category.

A few interesting lessons have come out of the work so far.  First, search is way harder than most people think.  While Lucene gives you lots of nobs and levers to tweak how documents are ranked, the results of those tweaking aren’t always what you think.  It’s sort of like moving furniture by throwing bowling balls at it, you may get things close, but you do a lot of collatoral damage in the process.  Recently I was attempting to boost scores based on terms showing up in the subject of posts, which completely overwhelmed our post rating scoring, making low quality posts show up at the top of the list.

You also notice when people are using search badly, or more specifically using bad search.  Using SQL Like clauses is not search, it’s grep.  Unfortunately most php sites do that because they don’t have anything better (Lucene has been ported to a lot of language environments, php is not one of them).  The gentoo wikis fall into this category.

Finally, you realize that google’s scoring, while good in general, may not actually be what you want for your problem domain.  The fact that the word seafood shows up 3 times in a post doesn’t make it a better post, but default scoring gives it a boost based on the number of times relevant terms show up.  Badger, Badger, Badger, while being non kosher, shouldn’t be scored highly in our results, even if we had a category fully dedicated to badgers and mushrooms.

Popularity: 12% [?]

The New Dague.Net

Tuesday, September 4th, 2007

After years on Livejournal, I’ve decided to migrate off to my own wordpress installation. This will hopefully help consolidate some of my content which has been spread out on a number of different sites recently. Expect to see some more traffic here shortly once I get all the changes in place.

Popularity: 8% [?]

C# moment of clarity

Tuesday, August 7th, 2007

The good thing about changing technical focus is all the new exciting things to learn. The bad things is… all those new exciting things to learn mean your development output drops to the floor for some period of time. It’s always a frustrating window of time, be it a month or two, where you feel like an idiot. Having done these changes enough times in the past, I know this too will pass. That doesn’t change the fact that while you may have read 200 pages of developer documentation on a given day, your emacs buffer looks eerily similar at the end of the day as when the day kicked off.

Inevitably, you hit a break through, and now all that example test code that didn’t compile, and you didn’t know why, starts working, and patterns fall into place. Yesterday I had such a moment of clarity around C# and ADO.NET (which is MS’s db interface layer). It turns out that in the function “SqlConnection(string)”, Sql doesn’t mean “generic sql engine”. Sql actually means “MSSQL vendor extension”. Some set of compile errors yesterday got me to on a lark change that to SqliteConnection, and stuff worked. A lot of stuff worked, all at once.

I had to step back from the computer and make sure no evil spirits had come or gone in the process. Leave it to microsoft to very clearly muddle the difference between “something generic”, and “something only we have”, as to them the whole world looks like something only they have. Boo microsoft!

With that set of filters back in hand, the O’Reilly books around C# are now falling into place much more quickly. The persistence engine for OpenSim should have a good first pass by the end of the day, and I’m not feeling so stupid any more.

I also have to give MS some credit on ADO.NET. While C# looks a lot like Java, the patterns and objects they created for database interfacing looks way more like a dynamic language (be it php, perl, or ruby), especially on the read side. Read site, what will take 50 lines of code in C#, would probably be 200 lines in Java. So not boo to microsoft there.

Time to get back to that emacs buffer.

Popularity: 9% [?]

The end of “the space”

Tuesday, July 24th, 2007

For those that had not previously heard, NYCCCP (aka “the space”) is coming to an end. The space was the idea of Porkchop and Mike, based on hacker spaces that existed in Boston and Phili. The idea is relatively simple. Rent a reasonable sized (in our case 50′ x 20′) location, set up desks for all people that are interested in joining. Build a server room, and get a pretty decent synchronous DSL line in. Cost for the space is distributed among it’s members, all of whom pay a monthly membership fee to keep the place running. I wasn’t originally part of the space, but did join up a year later, and it was a good place to host some Xen test servers.

The space has been running for 3 years, but over the last year people were going there less often, and interest had definitely waned a bit. A few members were lost as they moved a bit further away. Two weeks ago, the overall financials went from self sustaining, to dropping at a relatively sharp pace. In 3 years, the dynamics of the group changed. A lot of us met through the LUG, but became friends outside of it. Originally we only had computers in common, and the space was a good gathering point. But now we do scifi night every week at my house, see each other for lunch a couple days a week, and do plenty of things on the weekends (like biking and hiking).

So, Mike, Porkchop, and I agreed it was time to call the space a grand experiment, that was a good thing, but whose time had passed. The space will shutdown the end of September, and we’re in process of getting everything/one sorted out there and out of the building (there are a bunch of other folks with servers there that will need to move as well).

It’s a sad thing to see go, but times do change.

Popularity: 8% [?]

Fun with visualization

Saturday, July 21st, 2007

In an effort to wrap my head around some of the code for OpenSim, I took a detour and started adding C# support to autodia. Autodia was originally written as something to create dia UML diagrams from perl code, but extended from there to support many languages, and many output formats. Unfortunately, C# is not yet one of those, yet.

Right now I’ve got class and attribute parsing pretty well under control (except for generics). Autodia definitely evolved on less object oriented languages than C#, as one of the things I’m most interested in knowing is the contains relationships in the codebase, which isn’t supported in the current version (though I know know how to add it, just need a couple of hours). One of the things I’m trying to expose is one of the gotchas of object design: the inbreeding that can come from having parents and members all be the same base class. I’m sure there is some good banjo joke in there, but I’m a cup of coffee short of finding it.

The results, are quite pretty:

Once the work is in a more finished state, I’ll be pushing it back upstream, so others can benefit as well.

Popularity: 8% [?]

Refuctoring

Friday, June 15th, 2007

Ah, those internets.

Refuctoring is the process of taking a well-designed piece of code and, through a series of small, reversible changes, making it completely unmaintainable by anybody except yourself. Comprehensive regression testing guarantees that nobody will be any the wiser.

Read the whole thing here.

Popularity: 5% [?]

LUG Radio, Redmonk, and other things I learned recently

Monday, June 11th, 2007

I was attempting to find a useful podcast tool on Linux so that I can get This American Life as a podcast, instead of my normal method of timeshifting our local NPR station. After a few attempts I found Castpodder, which had the best interface of any of the pieces of software that I could just package install off the network. And off I was to start setting up podcasts.

Castpodder had the benefit of prepopulating the tool with a couple of podcasts, one of which was LUG Radio, a regular podcast by a bunch of Linux Users in the UK. While there are parts of it where I think they could get their facts a bit better, overall it is a pretty amusing show, and it has definitely let me know about a few things I wouldn’t have otherwise.

One of them was Redmonk, an open source analyst firm. These guys do analysis of open source software and communities from a business perspective, and post all their content online. From their charter:

RedMonk is the first analyst firm built on open source. We’re dedicated to providing high quality research at no cost, and believe that the dialog that follows is beneficial to us, our community and our clients.

They also have a podcast, though I haven’t started listening to it yet as I’m getting through some of the LUG Radio backlog right now. However, as we start looking more at Linux in schools, it’s good to get some information on best bractices in Open Source beyond just my own personal experience. Redmonk looks like a reasonable place to gather some of that information.

The last thing I learned is that C# doesn’t kill puppies, at least not that many of them. I’ve been looking at it a bit recently, and basically it’s Java with all the rough edges scrubbed off. The fact that there is an actual open implementation that works, and that it comes with nearly every distro now.

Popularity: 11% [?]

Pidgin 2.0

Wednesday, May 9th, 2007

Pidgin 2.0 just released, which has given me a chance to start looking at IM plugins (which I wasn’t going to do during Gaim’s 2 year beta cycle where that interface was always changing). Pidgin 2.0 has some nice changes to it, including final unification of buddy accounts, which I was so used to from Everybuddy years ago.

I’ve already found a couple of bugs in pidgin, but I’m also starting to get familiar enough with the source base that I’m starting to figure out fixes at the same time (at least for the less subtle ones). The last couple of days I spent some time forward porting a private set of plugins from 2.0b2 -> pidgin 2.0 without using the gaim-compat layer. I’ve still got one subtle segfault (looks like a race in buddy status update), but it was a great learning experience on the source base and the API. Hopefully I’ll manage to have enough time to hack up some other plugins over the next many moons.

Popularity: 5% [?]

Find of the week - two-mode-mode

Saturday, February 10th, 2007

I just came across two-mode-mode when looking for some xemacs extensions for editing ruby on rails code, and it is awesome.

The basic problem is one that anyone editing PHP, or even HTML with embedded CSS or Javascript has run into. Your editor locks to HTML editing mode, with indentation and font coloring for it, great. Then you get into your code block which is really a different language, and you no longer have an editing mode appropriate to the lines you are changing. Along comes two-mode-mode (or probably better called multi-mode-mode). The primary mode is HTML. Any time you enter blocks that look like something else (i.e. CSS, Javascript, Ruby, PHP, Python) emacs changes modes, re-font-locks, and life is good. You exist the block, and emacs flips you back to HTML mode.

I made a couple minor changes so that PHP mode used PHP, and CSS and Javascript modes were also detected. There were also 2 functions that xemacs didn’t know about, but a quick m-x appropos let me find close enough versions for xemacs. Things like this are another reason I just love xemacs. :)

Popularity: 5% [?]