Tag Archives: python

screenshot_112

IPython Notebook Experiments

A week of vacation at home means some organizing, physical and logical, some spending times with friends, and some just letting your brain wander on things it wants to. One of the problems that I’ve been scratching my head over is having a sane basis for doing data analysis for elastic recheck, our tooling for automatically categorizing races in the OpenStack code base.

Right before I went on vacation I discovered Pandas, the python data analysis library. The online docs are quite good. However on something this dense having a great O’Reilly Book is even better. It has a bunch of really good examples working with public data sets, like census naming data. It also has very detailed background on the iPython data notebook framework, which is used for the whole book, and is frankly quite amazing. It brought back the best of my physics days using Mathematica.

screenshot_109

With the notebook server iPython isn’t just a better interactive python shell. It’s also a powerful webui, including python autocomplete. There is even good emacs integration, which includes supporting the inline graphing toolkit. Anything that’s created in a cell will be available to future cells, and cells are individually executable. Looking at the example above, I’m setting up the basic json return from elastic search, which I only need to do once after starting the notebook.

screenshot_110

Pandas is all about data series. It’s really a mere mortals interface on top of numpy, with a bunch of statistics and timeseries convenience functions added in. You’ll find yourself doing data transforms a lot in it. Like my college physics professors used to say, all problems are trivial in the right coordinate space. Getting there is the hard part.

With the elastic search data, a bit of massaging is needed to get the list of dictionaries that is easily convertable into a Pandas data set. In order to do interesting time series things I also needed to create a new column that was a datetime convert of @timestamp, and pivot it out into an index.

You also get a good glimpse of the output facilities. By default the last line of an In[] block is output to the screen. There is a nice convenience method called head() to give you a summary view (useful for sanity checking). Also, this data actually has about 20 columns, so before sanity checking I sliced it down to 2 relevant ones just to make the output easier to grok.

screenshot_111

It took a couple of days to get this far. Again, this is about data transforms, and figuring out how to get from point a to point z. That might include include building and index, doing a transform on it (to reduce the resolution to day level), then resetting the index, building some computed columns, rolling everything back up in groupby clauses to compute the total number of successes and runs for each job on a certain day, and doing another computed column in this format. Here I’m also only slicing out only the jobs that didn’t have a 100% success rate.

screenshot_112

And nothing would be quite complete without being able to inline visualize data. This is the same graphs that John Dickinson was creating from graphite, except on day resolution. The data here is coming from Elastic Search so we do miss a class of failures where the console logs never make it. That difference should be small at this point.

Overall this has been a pretty fruitful experiment. Once I’m back in the saddle I’ll be porting a bunch of these ideas back into Elastic Recheck itself. I actually think this will make asking the interesting follow on questions on “why does a particular job fail 40% of the time?” because we can compare it to known ER bugs, as well as figure out what our unclassified percentages look like.

For anyone that wants to play, here is the iPython Notebook raw.

Unity and Pidgin

One of the things that happened once getting to Ubuntu 12.04 was that gnome-do started acting up on me. Given that it’s a very minimally maintained project, I decided it was time to move on. Ubuntu’s dash provides a lot of the same functionality, so I finally started using it. But, I missed a few things.

Gnome-do isn’t just a launcher for programs, it’s an actions engine. It has support for Pidgin buddy lists, an I even wrote an NX launcher for it. I didn’t really want to give either of those up, so I started trying to figure out how to add those to Unity’s launcher itself.

Unity lenses, the plugins that support results, are written in either vala or python. Given that I’m trying to reflex my python muscles now, I decided that was my way in. After a few false starts I found the One Hundred Scopes project, which is an attempt to build a whole set of Unity lenses to add functions and examples for the world.

The wikipedia example is a good starting point. It gives you an idea of how to build a custom search and return results. That enabled me to build a basic launcher that fed up file urls for launching nx sessions, and I created an pushed unity-lens-nx that implements that.

But, pidgin is a little harder. There is no file to open for pidgin, this is about communicating with another program over dbus, and catching the action to do something else with dbus. Fortuntately through the help of David Callé I figured it out. Also, once I had it I found some good tricks (and a couple of undocumented dbus methods) here –  https://github.com/gregorl/Unity-Pidgin-Lens which I used as inspiration.

The net result is unity-lens-pidgin.

2 key MHVLUG people online right now

Super + b and search you buddy list. It only displays currently online buddies, and available buddies are preferred over unavailable ones. You can get it via ppa here.

There is plenty more to do. The search results should be smarter, especially taking into account most recently contacted buddies, which means integrating with zeitgeist or something equivalent. Ideas floating around there. I’d like to do some overlays with status icons, just to give a visual clue on either protocol or current status state.

If anyone else wants to help, or has their own ideas, I’d encourage you to join in on the conversation. The One Hundred Scopes community is pretty cool, and I’m happy to make their vision a little closer to reality.