The Nova API in Kilo and Beyond

Over the past couple of years we've been trying to find a path forward with the Nova API. The Nova v2.0 API defined a very small core set of interfaces that were basically unchangable and copied from Rackspace early in the history of the project. We then added a way to extend it with extensions, both in the upstream community, as well as vendor extensions out of tree.

This created a user experience that was... suboptimal. We had 80+ extensions in tree, they were of various quality and documentation. Example: floating ips were an extension (officially out of the API), but are used so extensively that they were a de facto part of the core API. Except, of course, they weren't, so people trying to use the API had a few things they could count on, and a ton of things that might or might not be there.

This was a disaster for the promise of interoperable clouds.

Even figuring out what a cloud could do was pretty terrible. You could approximate it by listing the extensions of the API, then having a bunch of logic in your code to realize which extensions turned on or off certain features, or added new data to payloads.

We took a very long journey to come up with a better way, with lots of wrong turns and dead ends. That's a story for another day. Today I'd like just explain where we got to, and why.

A User-First Perspective

Let's take a step back and think about some stakeholders involved in OpenStack, and what they need and want out of it.

Jackson the Absent

Last year Jackson wrote an application that works against OpenStack; it's been deployed in production and is part of the workflow at his organization.

Jackson has decided to leave tech and take up goat farming. His application should continue to work without changes even after the OpenStack cloud it's running against has been upgraded multiple times.

Emma the Active

Emma is the kind of person that loves new features and eagerly awaits her OpenStack cloud upgrades to see the new things she can do. She should be able to have her application work fine across upgrades, and then be able to modify it to take advantages of new features exposed in the API.

Sophia the Multi-Cloud Integrator

Sophia has an application that spans multiple OpenStack clouds run by different organizations. These clouds will be at different versions of OpenStack, and thus will expose different API features.

Sophia needs to be able to talk to all these clouds, know what features they expose, and have a single program that can talk to them all simultaneously.

(Note: this is what our own nodepool does that runs all the OpenStack upstream tests)

Aiden the Cloud Operator

Aiden knows he has a lot of users of the OpenStack API at his location. He'd really like to know who the Jacksons and Emmas of the world are, so that he can keep an eye on the far future of whether it's ever safe to disable really old features in his cloud.

Olivia the Contributor

Olivia wants to get her feature added to Nova which needs exposing through the API. She'd like to be able to get that landed during a release, and not have to wait 3 years for an eventual API rewrite.


Considering the needs of these various users helps to determine the key requirements for an API to something like OpenStack: an Open Source project that's deployed in many different companies and environments, with possibly years of difference in the version of code deployed at any of these locations.

  • The API should be as common as possible between deploys. Every optional feature is a feature that can't be depended on by an application writer. Or worse: if it's used, it's a lock-in to that cloud. That means software has to be rewritten for every cloud, or written with a horrible kluge layer.
  • It needs to be really clear exactly what a particular cloud's API supports.
  • Older applications must not be broken by new features, or need to be rewritten after their OpenStack cloud is upgraded.
  • We have to have a way to get new features out in a timely basis.
  • We have to be able to evolve the API one piece at a time, as the Nova API is sufficiently large that a major version bump is no longer possible (we learned this the hard way).

The Backwards Compatibility Fallacy

Nearly every conversation with a developer around this issue starts with "why can't we just add more data to structures in backwards compatible ways". This is how service providers like Amazon, Meetup, and others work.

The problem is we aren't a proprietary company with 1 revision of our API stack in the wild at a time. We are an Open Source project with thousands of deployments all over the world, all on different code revisions and upgrade cadence.

Let's play the exercise where we thought additive was good enough.

A great example currently exists: ipv6 filtering on server lists. Currently, the Nova client erroneously says it's supported. It's not; it's actually completely ignored on the server. A suggestion is that this is a backwards compatible addition, so we should just do it, and we don't need to signal to the user that this was an API change.

However, that assumes that time is monotonically moving forward; it's not. Sophia might run across a cloud that had gotten to this version of the code and realized she could filter by ipv6 address. Great! She writes her code to use and depend on that feature.

Then she runs her code at against another cloud, which runs a version of Nova that predates this change. She's now effectively gone back in time. Her code now returns thousands of records instead of 1, and she's terribly confused why. She also has no way to figure out if random cloud Z is going to support this feature or not. So the only safe thing to do is implement the filtering client side instead, which means the server side filtering actually gained her very little. It's not something she can ever determine will work ahead of time. It's an API that is untrustworthy, so it's something that's best avoided.

We've seen this problem before on the World Wide Web. Javascript APIs in browsers are different, and while parts of them converge towards standardization over time, you end up writing a lot of shim compatibility layers to take advantage of new features but not break on old implementations. This is the way of decentralized API evolution and deployment. Once you get past one implementation controlled by a single organization you have to deal with it.

API Microversions

There are some amazing things in the HTTP specification, some really great ideas that I am amazed were thought about back in the early days of the web. One of which is Content Negotiation. A resource is addressed by a URL (Uniform Resource Locator). However, that resource might be available in multiple representations: there might be a text version, an html version, and a pdf version. The HTTP spec provides a header that allows you to tell the server what kind of representation you would like for your resource. The server can say "that's not possible" and then you try again with something different, but it gives you as the client a lot of control in what you are going to get.

What if APIs worked like that? It's always a server, but I'd really like the 2.253 representation of it, which has some fields that are really handy for me.

Microversions are like Content Negotiation for the API.

Like Content Negotiation, the requested Microversion is passed as an HTTP header. Unlike Content Negotiation we don't support ranges, as the complexity to client programming gets out of control. Like Content Negotiation, if nothing is provided, we do a sane thing: send the minimum supported version.

Nova v2.1

Nova v2.1 is a new, cleaner backend implementation of the Nova v2.0 API. The one thing it adds is consistent input validation, so we catch bad requests at the API layer and return a sane error to the user. This is much more straight forward than our old model of trying to translate a stack trace (possibly triggered by a database violation) into a meaningful error message to the user.

Applications that work on v2.0 can be pointed to v2.1, and will just work. It should be transparent enough to the application authors that they'll never notice the transition.

And onto this Nova v2.1 API endpoint, we start adding microversion features. If you want features in the 2.3 microversion, you specify that fact in your header. Then you'll get the v2.3 versions of all the resources.

If you specify nothing, you get the minimum supported version, which rolls back to v2.1, which is the same as the v2.0 API. So all existing applications just work without doing anything. Only when an application wants to opt into new features does it need to make changes.

Solving for Stakeholders

Let's look at how this solves things for our stakeholders:

  • Jackson: his application keeps running, v2.1 is v2.0. His application needed to make 0 changes to run as it did before.
  • Emma: she can poll the Versions endpoint and discover that hercloud now supports 2.4. So she can start coding to those features in her application and put a 2.4 version request into all of her code.
  • Sophia: she can now probe all of the clouds she's working with to find out what feature levels they support, based on the information provided by the Versions endpoint. As request version is per request, she can either figure out some API version that intersects all her clouds and write to that, or she can write client-side paths based on different versions she's sure she can support and has tested (a 2.1 path, a 2.4 path, a 2.52 path) and dynamically use the best path supported on a particular cloud. This approach works even after BleedingEdgeCo cloud has set a minimum supported version at 2.50, even though ImSlowWithUpgradesCo cloud still only is up to 2.4. Sophia's job was never going to be fun, but it's now possible, without building a giant autoconf-like system to probe and detect what clouds actually support, or worse: trying to piece it together from a range of service provider and product documentation.
  • Aiden: he's now collecting client request version information on inbound requests, which means that he can figure out which of his users are still using older code. That provides the ability to know when, if ever, it's safe to move forward. Or even be able to have a chat with folks using Jackson's ancient tools to figure out what their long term support strategy should be.
  • Olivia: she can now help evolve the Nova API in a safe way, knowing that she's not going to break existing users, but will still be able to support new things needed by OpenStack.

Nova v2.1/v2.0 Forever (nearly)

There are some details about how we implement microversions internally in Nova, which means our assumption is that we're supporting the base v2.1 API forever. We have the facility to raise the minimum version; however we've stated the first time we're even going to have that conversation is in Barcelona in Fall of 2016. That doesn't mean we'll raise the minimum in Orzo, but we'll have our first conversation, with lots of data from operators and application developers to see how things are going, and what's a realistic path moving forward.

One API - no more extensions

There were multiple ways we could have gone about microversioning; one of the original suggestions was versions per resource. But the moment you start thinking about what the client code would look like to talk to that, you want to throw up a little bit. It was clear that to provide the best user experience we needed to draw a line in the sand and stop thinking about the Nova API as a toolkit to extend, and declare it as a solid thing that all users can expect from their clouds.

The extensions mechanism is deprecated. All the extensions in the Nova tree are now in the Nova API. Over the next couple of cycles we'll be consolidating some of the code to make this more consistent, and eventually remove the possibility of out-of-tree extensions working at all. This allows the API to have a meaningful monotonically increasing API version that will mean the same thing across all deploys.

This is also a signal to people working on Nova that "innovating on the API out-of-tree" is a thing we not only don't find valuable, but is fundamentally hostile to the creation of an application ecosystem for OpenStack.

If you need new things, come talk to us. Let's figure out how to do it together, in tree, in an interop-friendly way. And yes, this means some features won't be welcomed, or will be delayed as we consider a way to make them something that could work for a majority of deployers / hypervisors, and that could be a contract we could support long-term.

Never another API endpoint

You should never expect another API endpoint from Nova. Our versioning mechanism is no longer about the endpoint name, it's the Nova API with a Microversion header in the request. Applications will never need to think about a big rewrite because they have to jump API endpoints. They can instead take advantage of new features on a time table that makes sense to them.

Next Steps

Microversioning is a new thing, but it's already shown quite a bit of promise. It's also been implemented by other projects like Ironic to address the same kinds of concerns that we saw. It's very exciting to see this spread across OpenStack.

This mechanism will also let us bring in things like JSON Home to expose the resource tree in Nova as a resource itself. And work on concepts like the Tasks API to provide a better workflow for creating multi-part resources in Nova (a server with networks and volumes that should get built as an atomic unit).

Discoverability is not yet fully solved, as the policy that applies to a user is still hidden. We hope that with some of the upcoming work on Dynamic Policy in Keystone we can get that built into the API as well. That will give us a complete story where an application can know what it can do against a given cloud before it gets started (as a combination of supported microversion, and exposed policy).

And there is a lot of internal cleaing up, documentation, and testing still to do. The bulk of the Liberty cycle API work is going to be there to put some polish on what we've got.

A big thanks to Chris Yeoh

The journey that got us here was a long and hard one. It started 5 cycles ago at the Grizzly summit. A year ago no one on the team was convinced we had a path forward. Really hard things are some times really hard.

Chris Yeoh was the point person for the API work through this whole time, and without his amazing patience and perseverance as we realized some terrible corners we'd painted ourselves into, and work that had to be dropped on the floor, we probably wouldn't have come out the other side with anything nearly as productive as what we now have. We lost him this spring at far too young an age. This work is part of his legacy, and will be foundational for much of OpenStack for a long time to come.

Thank you Chris, and we miss you.

Do you want to learn more?

If you some how made it this far, and want even more details about what's going on the following is great follow up reading:

Facebook PGP Email

Facebook just added optional PGP support for all their email notifications to users:

To enhance the privacy of this email content, today we are gradually rolling out an experimental new feature that enables people to add OpenPGP public keys to their profile; these keys can be used to "end-to-end" encrypt notification emails sent from Facebook to your preferred email accounts. People may also choose to share OpenPGP keys from their profile, with or without enabling encrypted notifications.Source: Securing Email Communications from Facebook

Serious kudos to Facebook for building PGP into their infrastructure. The fact that Email is transported in the clear is something that most people forget. I tried it this morning, and it works well.

The Biggest Concerns About GMO Food Aren't Really About GMOs

Everyone from Chipotle to the Food Babe rails against genetically modified ingredients, and laws to label GMO foods are making progress in some states. But the laser focus on GMOs is misguided, because most of the concerns people raise about them aren’t really about GMOs.

“GMO” is the buzzword for genetically modified crops where the plant’s DNA has been changed in the lab, typically by inserting a gene from another species. Technically there are other types of genetically modified organisms (living things), but no GMO animals are used in our food, and GMO bacteria are widespread but not controversial.

Source: The Biggest Concerns About GMO Food Aren't Really About GMOs

This whole article is a must read for anyone interested in the current state of how the modern food system works. It's pretty incredible in actually looking in depth at a slew of mechanisms used to hybridize our food, and which the GMO label actually only applies to a very narrow slice of some of the most well controlled using bacterial gene transfer. A mechanism that was recently discovered to have happened naturally, thousands of years ago, with the Sweet Potato.

Also, incredibly, the comments on that article are incredibly thoughtful and nuanced. It's one of the few internet conversations that I've seen recently where people were legitimately curious and thought provoking.


Clean Up Your Mess - A Guide to Visual Design for Everyone

If you're like most people, you feel like a baby when it comes to visual design. You sometimes have a vague sense of what you want, but can't articulate it or make it come about. All you can do is point and cry. This guide will help you communicate with conscious skill. It will show you how to create designs that are easy to understand and attractive.Beyond giving you practical tools, I hope this guide inspires you. One of my favorite quotes is, "I open my eyes and I see paradise." What a great gift vision is! What an incredible way to connect to the world around us and to each other. My hope is that this guide will allow you to communicate with more creativity and more control – and that you'll want to learn more.








Source: Clean Up Your Mess - A Guide to Visual Design for Everyone

A great design primer. If you learn nothing else from it don't ever center justify text will immediately make everything you do better. Center justification is for people that can't commit. Go hard left or hard right justification and take a stand.

This Is How Fast America Changes Its Mind

As the Supreme Court considers extending same-sex marriage rights to all Americans, we look at the patterns of social change that have transformed the nation.

Source: This Is How Fast America Changes Its Mind | Bloomberg Business - Business, Financial & Economic News, Stock Quotes

Really interesting visualization and look at how social issues hit a tipping point far faster than one might expect, and then the federal government just steps in and unifies things. Love looking at data sets like this.

Ebola behind a paywall

MONROVIA, Liberia — The conventional wisdom among public health authorities is that the Ebola virus, which killed at least 10,000 people in Liberia, Sierra Leone and Guinea, was a new phenomenon, not seen in West Africa before 2013. (The one exception was an anomalous case in Ivory Coast in 1994, when a Swiss primatologist was infected after performing an autopsy on a chimpanzee.)

The conventional wisdom is wrong. We were stunned recently when we stumbled across an article by European researchers in Annals of Virology: “The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone.” In the future, the authors asserted, “medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics,” referring to hospital-acquired infection.

What triggered our dismay was not the words, but when they were written: The paper was published in 1982.

via Yes, We Were Warned About Ebola -

The information existed that Ebola existed in Liberia. However, that information was trapped behind a research journal paywall, so didn't end up in the hands of the people that really could have used it. A contributing factor to why this Ebola outbreak got so out of control.

Choose Boring Technology

But of course, the baggage exists. We call the baggage "operations" and to a lesser extent "cognitive overhead." You have to monitor the thing. You have to figure out unit tests. You need to know the first thing about it to hack on it. You need an init script. I could go on for days here, and all of this adds up fast.


The problem with "best tool for the job" thinking is that it takes a myopic view of the words "best" and "job." Your job is keeping the company in business, god damn it. And the "best" tool is the one that occupies the "least worst" position for as many of your problems as possible.It is basically always the case that the long-term costs of keeping a system working reliably vastly exceed any inconveniences you encounter while building it. Mature and productive developers understand this.

via Dan McKinley :: Choose Boring Technology.

I've had way too many conversations recently that had some variation of "people should just get over it and learn new things". New has a lot of cost. Even if it's good, new brings load. If you exceed the digestion rate of new technology on people, they hit a wall, give up, and go home mad.

Various rambling thoughts from my personal corner of the internet