Tag Archives: development

Python Design Patterns

A friend pointed me to this talk by Brandon Rhodes on python design patterns from PyOhio a couple of years ago.

The talk asks an interesting question: why aren’t design patterns seen and talked about in the Python community. He walks through the patterns in Design Patterns: Elements of Reusable Object-Oriented Software one by one, and points out some that are features of the language, some that are used in the standard library, and some that are really applicable. All with some nice small code examples.

The thing that got me thinking though was a comment he makes both at the beginning and end of the talk. The reason you don’t see these patterns in Python is because Python developers tend not to write the kind of software where they are needed. They focus on small tools that connect other components, or live within a framework.

I’m a newcomer to the community, been doing Python full time for only a few years on OpenStack. So I can’t be sure whether or not it’s true. However, I know there are times when I’m surprised by things that I would have expected to be solved already in the language, or incompatibilities that didn’t need to be there in the python 2 to 3 transition, and wonder if these come from this community not having a ton of experience with software at large code base size, as well as long duration code bases, and the kinds of deprecation and upgrade guarantees needed there.

OpenStack Emacs Tools

Over the past few weeks I’ve been tweaking my emacs configs, and writing some new modes to help with OpenStack development. Here is a non comprehensive list of some of the emacs integration I’m finding useful in developing for OpenStack (especially things that have come up in conversation). URLs provided, though I won’t walk through all configuration in depth.

tramp

Tramp is a built in facility in emacs that lets you open files on remote machines via ssh (and other protocols). This means your emacs runs locally, with all the latency gains that has, as configured as you would like, but editing can be done across multiple systems. A remote file url looks like /servername:/file/path. All the normal file completion that you expect works after that point.

I tend to do code that doesn’t need a full stack run locally on my laptop or desktop, but full stack code happens on my NUC devstack box, and tramp lets me do that from an single emacs instance.

More info about tramp at emacswiki.

fly-hack

Emacs has an in-buffer syntax checker called flymake. There are various tutorials out there for integrating pyflakes or flake8 into that. However in OpenStack we have hacking, which extends flake8 with new rules. Also, every project turns on custom ignores. Also, many projects extend flake8 further with custom rules for that repo.

screenshot_151

fly-hack uses the flake8 in the .tox/pep8 venv for each project, and uses the tox.ini config for each project, so when in Nova, nova rules will be enforced, when in Cinder, cinder rules will be enforced. Mousing over the error will pop up what it is (you can see the H306 in that screenshot). It has a fallback mode when using it over tramp that’s a reasonably sane flake8 least common denominator for OpenStack projects.

More info at fly-hack github page – https://github.com/sdague/fly-hack

stacktest

Our testr / subunit / testtools testing toolchain gets a bit heavy handed when trying to iterate on test development. testr discovery takes 20 – 30s on the Nova source tree, even if you are trying to only run 1 test. I became inspired at the Ops Summit in Philly to see if I could do better. And stacktest.el was born.

screenshot_153

It’s mostly a port from nosemacs which allows you to run tests from an emacs buffer, however it does so using tox, subunit, or testtools, depending on whether you want to run a top level target, test a file, test an individual test function, and/or use pdb. It works over tramp, it works with pdb, and it uses the subunit-trace tooling if available.

I’ve now bound F12 to stacktest-one, which is a super fast way to both iterate on test development.

More info at the stacktest github page – https://github.com/sdague/stacktest

pycscope

OpenStack is a lot of code, and uses a ton of libraries. git grep works ok in a single repo, but the moment some piece of code ends up calling into an oslo library, that breaks down.

Peter Portante, OpenStack Swift contributor, maintains a pythonized version of cscope. It parses the AST of all the files to build a quite rich symbol cscope database. This lets you search for definitions (searching down), calling points (searching up), and references (searching sideways). Which very quickly lets you dive through a code path and figure out where you end up.

screenshot_155

The only draw back is the AST parse is consuming on something as large as the Nova tree, especially if you index all the .tox directories, which I do to let myself get all the way back through the library stacks that we include.

You can learn more about pycscope at it’s github page – https://github.com/portante/pycscope

flyspell-prog-mode

Emacs includes a spell checker called flyspell. Very useful for text files. What I only learned last year is that there is also a flyspell-prog-mode, which is like flyspell, but only acts on comments and strings that are semantically parsed by Emacs. This helps avoid a spelling mistake when writing inline documentation.

screenshot_156

More information at Emacs wiki.

lambda-mode

This is totally gratuitous, but fun. There is a small mode that does a display adjustment of the word ‘lambda’ to an actual ‘λ’. It’s a display adjustment only, this is still 6 characters in the buffer. But it makes the lambda expressions pop out a bit more.

screenshot_157

More information at Emacs wiki.

The important thing about having an extensible editor is actually extending it to fit your workflow. If you are an emacs user, hopefully this will give you some fun things to dive into. If you use something else, hopefully this will inspire you to go looking into your toolchain for similar optimizations.

We live on fragile hopes and dreams

OpenSSL isn’t formally verified!?

No, neither is any part of your browser, your kernel, your hardware, the image rendering libraries that your browser uses, the web servers you talk to, or basically any other part of the system you use.

The closest to formally verified in your day-to-day life that you’re going to get may well be the brakes on your car, or the control systems on a jet engine. I shit you not.

We live on fragile hopes and dreams.

via My Heart Bleeds for OpenSSL | Coder in a World of Code.

At lot of the internet is learning a lot more about how software in the wild functions after heartbleed. I found that statement to be one of the best summaries.

Devstack Vagrant

Devstack is tooling for OpenStack to make it easy to bring up an OpenStack environment based on the latest git trees. It’s used extensively in upstream testing, and by many OpenStack developers to set up dev/test environments.

One of my personal challenges in working on Devstack was testing devstack itself. Relying on the upstream gate means we have a limited number of configurations, and when something goes wrong, iterating on a fix is hard. Even more importantly, the upstream gate is currently only a single node test environment.

A month ago I hacked out a new tool – devstack-vagrant (available on github).

DevstackVagrant

Devstack vagrant provides a customized Vagrant environment that will build a 2 node devstack cluster under VirtualBox. The basic model is 2 devstack nodes (a controller and a compute) that bridge through a physical interface on your host. The bridged interface is set as the default route in the nodes so that 2nd level guests created on top of this 2 node devstack can route to the outside world.

The nodes start and build from official Ubuntu 12.04 cloud images, and are customized using the puppet provisioning support in vagrant. There are a few config variables you need to set in a config.yaml, including hostnames, bridge interface, and the password hash you want your stack user to have. Basically enough to bootstrap the environment and then run devstack from upstream git.

I added a bit of additional logic to the end of the normal devstack process that includes installing an Ubuntu 12.04 and Fedora 20 cloud image in your glance, injecting the ssh public key for the stack user into the keyserver, and opening up ping and ssh in all the security groups.

I still consider this an expert tool at this point, as in, if it breaks you get to keep all the pieces. However, this has been useful to me so far, and given the pull requests I got the other day, seemingly is useful for others as well. Patches definitely welcomed. And if it looks like more folks want to contribute I’ll happily move to stackforge.

One of the things I’d love to do is sort out a functioning libvirt backend for vagrant (there are 2, they are both a little wonky) because then the 2nd level guests could use nested KVM and not be ridiculously slow.

This tool has already proved very useful to me, so hopefully it will be useful to others as well.

Tools vs. Process

From Rafe Colburn’s post on Seven signs of dysfunctional engineering teams:

Preference for process over tools. As engineering teams grow, there are many approaches to coordinating people’s work. Most of them are some combination of process and tools. Git is a tool that enables multiple people to work on the same code base efficiently (most of the time). A team may also design a process around Git — avoiding the use of remote branches, only pushing code that’s ready to deploy to the master branch, or requiring people to use local branches for all of their development. Healthy teams generally try to address their scaling problems with tools, not additional process. Processes are hard to turn into habits, hard to teach to new team members, and often evolve too slowly to keep pace with changing circumstances.

You can think of it another way, tools encode behavior in a way that takes away choices. Which is great, because then you don’t have to worry about making the wrong choice. Then you can focus your mental energies on real problems.

Github vs. Gerrit

Julien Danjou, the project technical lead for the OpenStack Ceilometer project, had some choice words to say about github pull requests, which resonates very strongly with me:

The pull-request system looks like an incredible easy way to contribute to any project hosted on Github. You’re a click away to send your contribution to any software. But the problem is that any worthy contribution isn’t an effort of a single click.

Doing any proper and useful contribution to a software is never done right the first time. There’s a dance you will have to play. A slowly rhythmed back and forth between you and the software maintainer or team. You’ll have to dance it until your contribution is correct and can be merged.

But as a software maintainer, not everybody is going to follow you on this choregraphy, and you’ll end up with pull-request you’ll never get finished unless you wrap things up yourself. So the gain in pull-requests here, isn’t really bigger than a good bug report in most cases.

This is where the social argument of Github isn’t anymore. As soon as you’re talking about projects bigger than a color theme for your favorite text editor, this feature is overrated.

After working on OpenStack for the last year, I’m completely spoiled by our workflow and how it enables developer productivity. Recently I went back to just using git without gerrit to try to work on a 4 person side project, and it literally felt like developing in a thick sea of tar.

A system like Gerrit, and pre-merge interactive reviews, lets you build project culture quickly (it’s possible to do it other ways, but I’ve seen gerrit really facilitate it). The onus is on the contributors to get it right before it’s merged, and they get the feedback to get a patch done the right way. Coherent project culture is one of the biggest factors in attaining project velocity, as then everyone is working towards the same goals, with the same standards.

The OpenStack Gate

The OpenStack project has a really impressive continuous integration system, which is one of its core strengths as a project. Every proposed change to our gerrit review system is subjected to a battery of tests on each commit, which has grown dramatically with time, and after formal review by core contributors, we run them all again before the merge.

These tests take on the order of 1 hour to run on a commit, which would make you immediately think the most code that OpenStack could merge in a day would be 24 commits. So how did Nova itself manage to merge 94 changes since Monday (not to mention all the other projects, which adds up to ~200 in 3 days)? The magic of this is Zuul, the gatekeeper.

Zuul is a queuing system for CI jobs, written and maintained by the OpenStack infrastructure team. It does many cool things, but what I want to focus on is the gate queue. When the gate queue is empty (yes it does happen some times), the job is simple: add a new commit, run the tests, and we’re off. What happens if there are already 5 jobs ahead of you in the gate? Let’s take a concrete example of nova.

Speculative Merge

By the time a commit has gotten this far, it’s already passed the test suites at least once, and has had at least 2 core contributors sign off on the change in code review. So Zuul assumes everything ahead of the change in the gate will succeed, and starts the tests immediately cherry picking this change on top everything that’s ahead of it in the queue.

zuul-working

That means that merge time on the gate is O(1), that is merging 10 changes takes the same time as 1 change. If the queue gets too big, we do eventually run out of devstack nodes, so the ability to run tests is not strictly constant time. On the run up to grizzly-3 both the cloud providers (HP and Rackspace) which contribute these VMs provided some extra quota to the OpenStack team to help keep things moving. So we had an elastic burst of OpenStack CI onto additional OpenStack public cloud resources, which is just fun to think about.

Speculation Can Fail

Of course, speculation can fail. Maybe change 3 doesn’t merge because something goes wrong in the tests. If that happens we then kick the change out of the queue, and then all the changes behind it have to be reset to pull change 3 out of the speculation. This is the dreaded gate reset, because when gate resets happen, all the time spent on speculative tests behind the failure is lost, and the jobs need to restart.

zuul-reset

Speculation failures largely fall into a few core classes:

Jenkins crashes – it doesn’t happen often, but Jenkins is software too, and OpenStack CI tends to drive software really hard, so we force out edge cases everywhere.

Upstream service failures – we try to isolate ourselves from upstream failures as much as possible. Our git trees pull from our gerrit, not directly from github. Our apt repository is a Rackspace local mirror, not generically upstream. And the majority of pip python packages come from our own proxy server. But if someone adds a new python dependency, or a version of one updates and we don’t yet have it cached, we pass through to pypi for that pip install. On Tuesday pypi converted from HTTP to HTTPS, and didn’t fully grok the load implications, which broke OpenStack CI (as well as lots of other python developers) for a few hours when pypi effectively was down from load.

Transient OpenStack bugs – OpenStack is complicated software, 7 core components interacting with each other asynchronously over REST web services. Each core component being a collection of daemons that interact with each other asynchronously. Sometimes, something goes wrong. It’s a real bug, but only shows up under very specific timing and state conditions. Because OpenStack CI runs so many tests every day (OpenStack CI may be one of the largest creators of OpenStack guests in the world every day), very obscure edge and race conditions can be exposed in the system. We try to track these as recheck bugs, and are making them high priority to address. By definition they are hard to track down (they expose themselves on maybe 1 out of 1000 or fewer test runs), so the logs captured in OpenStack CI are the tools to get to the bottom of these.

Towards an Even Better Gate

In my year working on OpenStack I’ve found the unofficial motto of the project to be “always try to make everything better”. Continuous improvement is not just left to the code, and the tests, but the infrastructure as well.

We’re trying to get more urgency and eyes on the transient failures, coming up with ways to discover the patterns from the 1 in 1000 fails. After you get two or three that fail in the same way it helps triangulate the core issue. Core developers from all the projects are making these high priority items to fix.

On the upstream service failures the OpenStack infrastructure team already has proxies sitting in front of many of the services, but the pypi outage showed we probably need something even more robust to handle that upstream service outage, possibly rotating between pypi mirrors on the fall-through case, or a better proxy model. The team is already actively exploring solutions to prevent that from happening again.

As always, everyone is welcomed to come help us make everything better. Take a look at the recheck bugs and help us solve them. Join us on #openstack-infra and help with Zuul. Check out what the live Zuul queue looks like. All the code for this system is open source, and available under either the openstack, or openstack-infra github accounts. Patches are always welcome!

Two Solitudes – CS and Software

Via channels I can’t now remember, I came across this presentation about the very unsolved issue of how Computer Science as a field of study relates in any way to creating software.

With so many colleges in the area, and having a number of friends that are CS professors, and other IT staff at colleges, it continues to amaze me how disconnected these worlds are. All made the stranger by coming into the field sideways from a physics degree.

Plus, I love the term software carpentry.

Wise words about software

From a former Firefox developer, truer words were never spoken:

Software companies would do well to learn this lesson: anything with the phrase “users love our product” in it isn’t a strategy, it’s wishful thinking. Your users do not “love” your software. Your users are temporarily tolerating your software because it’s the least horrible option they have — for now — to meet some need. Developers have an emotional connection to the project; users don’t.

All software sucks. Users would be a lot happier if they had to use a lot less of it. They may be putting up with yours for now, but assume they will ditch you the moment something 1% better comes along — or the moment you make your product 1% worse.

I gave up Firefox as my main browser when Chrome decided to support WebGL on Linux, and Firefox kept back burnering it. Mozilla is now deprecating Thunderbird, which is the only product of theirs that I still use regularly. Guess that means I’m going to have to get used to GMail’s web interface eventually.

Sad to see something as critical as Mozilla, the beast that cracked the IE hegemony, become an also ran.

Drupal Meetup Events Module

I just released version 1.1 of the meetup_events module for Drupal. I started building this about 6 months ago when we started using Meetup for Mid Hudson Astronomical Association to draw in new members.

I hate data entry. I find entering the same data twice into a computer one of the most soul sucking things I could do. So having both events on our website, and on meetup, meant I needed to automate things.

Meetup events was thus born. The model is simple, your Drupal site is considered the authoritative resource. You select which node types (which have date fields) you want to sync to meetup, and whenever you create or update an event of that type a meetup entry is made (or updated) accordingly. The body content that is synced is tokenized (and I added a few tokens for getting nodereference content). Venues are a little hokey right now, but I just provide a selector for a numeric field either on the main type or on a nodereference which maps to it.

meetup_events settings

Once you set up the module, including your API key and group url, you pretty much forget it is there. Then you just edit as per normal. Any time you save a type that you are syncing you’ll notice this:

Which includes a link to the meetup event that was just saved.

One of the more recent things I worked on was integration with views so that there is now a Meetup Events: Meetup Link view field for nodes. If you add that to a node listing (and you’ve registered for an oauth key) you’ll get the meetup dynamic RSVP button for your event.

That required a bit of tweaking of the rsvp javascript to make it play nice with Drupal, but given that the meetup team was kind enough to release that code under open source.

There is plenty of work remaining here, ways this could be nicer, but I’m pretty pleased with the results so far. It’s made me learn a bunch about Drupal’s ctools module, more than I ever wanted to know about the drupal settings system, and how to pull in incompatible versions of jquery in custom tools.

I’ve also been really happy with my experience with the Meetup team. They’ve added multiple API calls for me to make my life easier. Thanks guys. If every developer facing team acted like that, the world would be a much better place.

The meetup platform is turning out to be a really great way for our Astronomy club to draw in new people, and I’ve started using it for MHVLUG now as well. Now that I’ve got meetup_events, I can do that seamlessly without data duplication, or degrading our native experience. If you are interested in doing the same, check out the module. Bugs and patches welcomed.