Category Archives: OpenStack

OpenStack CI by the numbers

For the week of Monday Oct 20th to Sunday Oct 27th (partial, it’s still Sunday morning).

  • 34894 – # of test jobs run by the CI system
  • 25316 – # of devstack clouds created by the CI system
    • 8254 – # of large ops runs (devstack + fake virt driver + limitted tempest test to drive it)
    • 940 – # of swift functional runs (devstack + swift functional tests)
    • 16122 – # of runs that do some level of devstack + tempest with libvirt qemu
  • 508536 – # of qemu guests started (successfully) in the devstack clouds
  • 128 – Max # of libvirt qemu guests started (successfully) in a single test run

Things that you figure out when you are playing with elastic search on a Sunday morning. One of the surprises for me was how much we use devstack outside of the base tempest run use case, and that our max # of guests spawned in a run is now over 100.

Update: Clark Boylan correctly pointed out that our guests are 2nd level, on platforms that don’t support nested kvm, and thus are libvirt/qemu guests not libvirt/kvm guests.

OpenStack Havana – the Quality Perspective

Like a lot of others, I’m currently trying to catch my breath after the incredible OpenStack Havana release. One of they key reasons that OpenStack is able to evolve as fast as it does, and the whole thing not fall apart, is because of the incredible preemptive integration gate that we have (think continuous integration++).

In Havana, beyond just increasing the number of tests we run, we made some changes in the nature of what we do in the gate. These changes are easy to overlook, so I wanted to highlight some of my favorites, and give a perspective in everything that’s going on behind the scenes when you try to land code in OpenStack.

Parallel Test Runner

Every proposed commit to an OpenStack project needs to survive being integrated into a single node devstack install, and hit with 1300 API & integration tests from Tempest, but until Havana, these were run serially. Right before Havana 3 milestone we merged parallel tempest testing for most of our jobs. This dropped their run time in half, but more importantly it meant all our testing was defaulting to 4 simultaneous requests, as well as running every test under tenant isolation, where a separate tenant is created for every test group. Every time you ratchet up testing like this you expose new race conditions, which is exactly what we saw. That made for a rough RC phase (the gate was a sad panda for many days), but everyone buckled down to get these new issues fixed, which were previously only visible to large OpenStack installations. The result, everyone wins.

This work was a long time coming, and had been started in the Grizzly cycle by Chris Yeoh, and spearheaded to completion by Matt Treinish.

Large Ops Testing

A really clever idea was spawned this summer by Joe Gordon: could we actually manage to run Tempest tests on a devstack with a fake virt driver that would always “succeed” and do so instantaneously. In doing so we could turn the pressure up on the control plane in OpenStack without the overhead of real virt drivers slowing down control plane execution enough that bugs could hide. Again, the first time we cranked this to 11, lots of interesting results fell out, including some timeout and deadlock situations. All hands went on deck, the issues were addressed, and now Large Ops Testing is part of our arsenal, run on every single proposed commit.

Upgrade Testing

Most people familiar with OpenStack are familiar with Devstack, the opinionated installer for OpenStack from upstream git. Devstack actually makes the base of our QA system, because it can build a single node environment from git trees. Lesser known is it’s sister tool, Grenade. Grenade uses 2 devstack trees (the last stable and master) to build an OpenStack at the previous version, inject some data, then shut down everything, and try to restart it with latest version of OpenStack. The ensures config files roll forward smoothly (or have specific minimal upgrade scripts in Grenade), database schemas roll forward smoothly, and that we don’t violate certain deprecation guarantees.

Grenade was created by Dean Troyer, I did a lot of work towards the end of Grizzly to get it in the gate, and Adalberto Medeiros took it the final mile in Havana and got this to be something running on every proposed commit.

New Tools for an Asynchronous World

September was the 30th anniversary of the GNU project. I remember some time in the late 90s reading or watching something about Richard Stallman and GNU Hurd. The biggest challenge of building a system with dozens of daemons sending asynchronous messages, is having any idea what broke when something goes wrong. They just didn’t have the tools or methods to make consistent forward progress. Linux emerged with a simpler model which could make progress, and the rest is history.

If you zoom back on OpenStack, this is exactly what we are building. A data center OS micro kernel. And as I can attest, debugging is often “interesting”. Without the preemptive integration system, we’d never be able to keep up our rate of change. However as the number of integrated projects has increased we’ve definitely seen emergent behavior that is not straight forward to track down.

Jobs in our gate will fail, seemingly at random. People unfamiliar with the situation will complain about “flakey tests” or a “flakey gate”, and just recheck their patch and see it pass on the second attempt. Most of the time neither the gate nor the tests are to blame, but the core of OpenStack itself. We managed to trigger a race condition, that maybe shows up 1% of the time in our configuration. We have moved to a world where test results aren’t binary, pass or fail, but better classified with a race percentage.

This is a problem we’ve been mulling over for nearly a year, and the solution which has been created is ElasticRecheck, a toolchain that uses Elastic Search on our test logs to check new failures against known failures. While finding a “fingerprint” for a failure is still a manual step, it was still of dramatic benefit for the release process. It got us out of thinking that there were only a couple of race conditions we were hitting, and realizing there were dozens of very specific races, each with their own fix. It also gave us a systematic way of determining which race conditions were most impacting us, so they could be prioritized and fixed.

This work was spearheaded by Joe Gordon and Matt Treinish, and leveraged some background work that Clark Boylan and I had done early in the cycle. ElasticRecheck is exciting enough technology all by itself, it deserves it’s own detailed dive. But that is for another day.

And many more…

These are just some of the sexiest highlights from the Havana release on the quality front.

The number of tests in Tempest that we run on every proposed patch has risen from 800 to 1300 during the cycle. This included new scenarios and a massive enhancement on coverage in all our services. 100 different developers contributed to Tempest during the Havana release (up from 60 in the Grizzly release), enhancing our integration suite. We’ve got a new stress framework which can provide load generation to burn in your cloud, which I expect will make an appearance in our gate during Icehouse.

The point being, lots of people, from lots of places, contributed heavily to make the Havana release the most solid release we’ve ever had from OpenStack. They did this not just with new features that make for good press releases, they also did this with contributions to the overall system that validates our software not once a day, not even once an hour, but on every single proposed patch.

So to everyone that contributed in this extraordinary effort: THANK YOU!

And I look forward, excitedly, to what we’ll create for the Icehouse release.


Gerrit queries to avoid OpenStack review overload

As with many OpenStack core reviewers, my review queue can be completely overwhelming, often 300 – 400 active reviews that I have +2 / -2 authority on. It’s really easy to get discouraged on a list that big. Fortunately there are ways to trim that down.

Gerrit provides a simple query language to select which reviews you see, using the query bar in the top right of the page:

The way this works is by adding criteria into the search box, which by default is ANDed together to get the final results. In the process these queries change the URL for Gerrit, so you can bookmark the resultant queries for easy access later.

Restricting to Single Project (and pulling your own stuff)

This query is basically what you get when you click on a project link:

status:open project:openstack/tempest

Nothing special, but you can go one step further by removing yourself from the list of reviews:

status:open project:openstack/tempest

This also demonstrates that we can have both positive criteria and negative criteria.

Little Lost Projects (don’t loose the little ones)

In addition to having +2 on nova, devstack, tempest, I’ve got it on a bunch of smaller projects, which I often forget I need to go review. You can build a single query that has all your little lost projects in a single list:

status:open (project:openstack-dev/hacking OR project:openstack-dev/grenade)

No Objections

You can also filter based on votes in the various columns. It’s not nearly as detailed as I’d like, but it is still useful. I have a basic query for No Objections on most projects that I review which looks something like this:

status:open project:openstack/tempest -Verified-1 -CodeReview-1 -CodeReview-2

This removes all reviews that have a current -1 in Verified column, and a -1 or -2 in the CodeReview column. So patches with negative feedback are dropped from view. The top of your review list may contain patches that haven’t cleared CI yet, but that’s easy to see. There might also be Jenkins -2 reviews in this list, but gate failed merges can usually use extra eyes.

I consider this a base list of patches that there is no reason I shouldn’t be reviewing them.

Potential Merges

I’m typically up and at my computer at 7am EST, which is often a very slow time for zuul. So one of the things I look for is code that only requires one more +2 to go to merge on projects like Nova. Many of these are easy to review fixes, and clear the decks before the queue gets busy in the afternoon.

status:open -Verified-1 CodeReview+2 -CodeReview-1 -CodeReview-2 (project:openstack/nova OR project:openstack/python-novaclient)

Like the last one, we are filtering out all patches with negative feedback, but also requiring that there is an active +2 on the patch. I also make sure to do this for both nova and python-novaclient, which often gets lost in the noise.

Lost Patches

Especially in Nova it’s easy for a patch to get lost, as there are so many of them. I define lost as a patch that’s passed CI, but has no feedback in code review.

status:open -Verified-1 Verified+1 -CodeReview+2 -CodeReview+1 -CodeReview-1 -CodeReview-2 (project:openstack/nova OR project:openstack/python-novaclient) branch:master

These patches are often from newer folks on the project, and as such often need more time, so I typically only go after lost patches if I know I can set aside a solid hour on them. However I try hard to get to this query at least once a week, to make sure things don’t get fully lost, as a -1 will give the patch originator feedback to work on, and a +2 will make it far more likely to get the attention of other core reviewers when they are looking for mergable code.

Experimenting with your own

The gerrit query language is somewhat limited (full docs are online), so it can’t do everything I’d like, but even just these few slices make it easier to be able to get into a certain mindset for reviewing different slices of code. I have a toolbar folder full of bookmarks for these slices on different projects to do just that.

If you have other gerrit queries you regularly use, please leave a comment. Would love to see the ways other folks optimize gerrit for their workload.

OpenStack Infrastructure Bootcamp

It was a cool week for OpenStack gatherings. Down in Washington DC an OpenStack Security Book Sprint was happening, while up in New York City, 20 of us were gathered for an OpenStack Infrastructure Bootcamp.

[pe2-image src=”” href=”″ caption=”20130627_093725_Prince St.jpg” type=”image” alt=”20130627_093725_Prince St.jpg” ]

Why do an infrastructure bootcamp? OpenStack, as a project, is really breaking some interesting new ground when it comes to software process flow and continuous integration. Unlike other projects, that test after code has landed in upstream master, we’ve got this incredible pre-merge test system that ensures that upstream master won’t be broken. It’s a system you need when you have over 550 contributors during a six month cycle. This is something beyond Continuous Integration as people normally think about it, though we realized we’re still quite lacking the words to describe it concisely.

This bootcamp was a great chance to go through that, in detail, and expose some of the areas where more contributors are needed to accelerate the project even further. We had all the “coremudgeons” of OpenStack infrastructure (Monty, Jim, Clark, and Jeremy), folks like myself that have landed some patches, or helped with specific efforts, and folks that were new to the whole thing, and just wanted to learn. Some of this I’d seen before, other bits I saw for the first time, and the whole system now makes more sense in my own head.

There were dinner and drinks after day one (the only day I could attend, sadly), and further ideas for improving the whole system flowed over beer, wine, food, and good company. I was struck again, during all of this, just how amazing of a community OpenStack is. We got 20 people together not to discuss or plan out features on OpenStack, but for features and improvements on the systems that facility OpenStack development. The kind of things we’re working towards are as advanced as semi-automatic failure coloration on build logs, to find statistically infrequent race conditions, upstream, instead of ever letting that hit a user in production. Awesome stuff.

Extra special thanks to Monty Taylor for pulling this together. It was no small task, and this wouldn’t have been possible without all his hard work on logistics to make it happen.

How an Idea becomes a Commit in OpenStack

My talk from the OpenStack summit is now up on youtube, where I walked people through the process of getting your idea into OpenStack. A big part of the explanation is what’s going on behind the scenes with code reviews and our continuous integration system.

I’m hoping it pulls away some of the mystery of the process, and provides a more gentle on ramp to everything for new contributors. I’ll probably be giving some version of this again at future events, so feedback (here or on youtube) is appreciated.

The OpenStack Gate

The OpenStack project has a really impressive continuous integration system, which is one of its core strengths as a project. Every proposed change to our gerrit review system is subjected to a battery of tests on each commit, which has grown dramatically with time, and after formal review by core contributors, we run them all again before the merge.

These tests take on the order of 1 hour to run on a commit, which would make you immediately think the most code that OpenStack could merge in a day would be 24 commits. So how did Nova itself manage to merge 94 changes since Monday (not to mention all the other projects, which adds up to ~200 in 3 days)? The magic of this is Zuul, the gatekeeper.

Zuul is a queuing system for CI jobs, written and maintained by the OpenStack infrastructure team. It does many cool things, but what I want to focus on is the gate queue. When the gate queue is empty (yes it does happen some times), the job is simple: add a new commit, run the tests, and we’re off. What happens if there are already 5 jobs ahead of you in the gate? Let’s take a concrete example of nova.

Speculative Merge

By the time a commit has gotten this far, it’s already passed the test suites at least once, and has had at least 2 core contributors sign off on the change in code review. So Zuul assumes everything ahead of the change in the gate will succeed, and starts the tests immediately cherry picking this change on top everything that’s ahead of it in the queue.

That means that merge time on the gate is O(1), that is merging 10 changes takes the same time as 1 change. If the queue gets too big, we do eventually run out of devstack nodes, so the ability to run tests is not strictly constant time. On the run up to grizzly-3 both the cloud providers (HP and Rackspace) which contribute these VMs provided some extra quota to the OpenStack team to help keep things moving. So we had an elastic burst of OpenStack CI onto additional OpenStack public cloud resources, which is just fun to think about.

Speculation Can Fail

Of course, speculation can fail. Maybe change 3 doesn’t merge because something goes wrong in the tests. If that happens we then kick the change out of the queue, and then all the changes behind it have to be reset to pull change 3 out of the speculation. This is the dreaded gate reset, because when gate resets happen, all the time spent on speculative tests behind the failure is lost, and the jobs need to restart.

Speculation failures largely fall into a few core classes:

Jenkins crashes – it doesn’t happen often, but Jenkins is software too, and OpenStack CI tends to drive software really hard, so we force out edge cases everywhere.

Upstream service failures – we try to isolate ourselves from upstream failures as much as possible. Our git trees pull from our gerrit, not directly from github. Our apt repository is a Rackspace local mirror, not generically upstream. And the majority of pip python packages come from our own proxy server. But if someone adds a new python dependency, or a version of one updates and we don’t yet have it cached, we pass through to pypi for that pip install. On Tuesday pypi converted from HTTP to HTTPS, and didn’t fully grok the load implications, which broke OpenStack CI (as well as lots of other python developers) for a few hours when pypi effectively was down from load.

Transient OpenStack bugs – OpenStack is complicated software, 7 core components interacting with each other asynchronously over REST web services. Each core component being a collection of daemons that interact with each other asynchronously. Sometimes, something goes wrong. It’s a real bug, but only shows up under very specific timing and state conditions. Because OpenStack CI runs so many tests every day (OpenStack CI may be one of the largest creators of OpenStack guests in the world every day), very obscure edge and race conditions can be exposed in the system. We try to track these as recheck bugs, and are making them high priority to address. By definition they are hard to track down (they expose themselves on maybe 1 out of 1000 or fewer test runs), so the logs captured in OpenStack CI are the tools to get to the bottom of these.

Towards an Even Better Gate

In my year working on OpenStack I’ve found the unofficial motto of the project to be “always try to make everything better”. Continuous improvement is not just left to the code, and the tests, but the infrastructure as well.

We’re trying to get more urgency and eyes on the transient failures, coming up with ways to discover the patterns from the 1 in 1000 fails. After you get two or three that fail in the same way it helps triangulate the core issue. Core developers from all the projects are making these high priority items to fix.

On the upstream service failures the OpenStack infrastructure team already has proxies sitting in front of many of the services, but the pypi outage showed we probably need something even more robust to handle that upstream service outage, possibly rotating between pypi mirrors on the fall-through case, or a better proxy model. The team is already actively exploring solutions to prevent that from happening again.

As always, everyone is welcomed to come help us make everything better. Take a look at the recheck bugs and help us solve them. Join us on #openstack-infra and help with Zuul. Check out what the live Zuul queue looks like. All the code for this system is open source, and available under either the openstack, or openstack-infra github accounts. Patches are always welcome!

Software Engineering Talk at Vassar College

While I’ve been giving talks at conferences and user groups for the last decade, I leveled up a little on Friday and was an invited speaker on the Vassar College Computer Science Asprey Lecture Series. The topic was Software Engineering at Scale, using the OpenStack project as an example.

I gave the folks there a glimpse of what’s behind a successful project that is able to integrate code from over 400 unique developers in 5 months time. I talked about planning, the design summits, the contribution and code review tools we use. But, as with every time I talk about OpenStack, the thing that really wows people is the testing infrastructure we’ve got. It was equally latched onto by the students and CIS staff in the room.

On every code submission we run style checks, unit tests (5000 of them in Nova now), and spin up a full OpenStack install and hit it with a nearly 700 test integration suite, before the first humans start looking at the code for manual review. It’s an incredibly empowering system, that means developers have a high bar to submit working code that doesn’t alter the behavior of the system. And it means that by the time the expert eyes do code review, the kinds of problems they are looking for are much more interesting.

Just this morning it meant I could look through a new proposed extension in gerrit and focus on some of the functional behavior, including understanding which kinds of code the test system has a harder time touching. The confidence that gives you as a reviewer that everything isn’t on the verge of breaking all the time, is enormous.

I’ve submitted a similar talk to the OpenStack summit, with a slightly different perspective of educating new developers on what the process from idea to code landing in the OpenStack tree is. Hoping that gets selected as it should be a good talk, and give me an excuse to polish some of my code flow diagrams a bit more.