Over the past couple of years we've been trying to find a path forward with the Nova API. The Nova v2.0 API defined a very small core set of interfaces that were basically unchangable and copied from Rackspace early in the history of the project. We then added a way to extend it with extensions, both in the upstream community, as well as vendor extensions out of tree.
This created a user experience that was... suboptimal. We had 80+ extensions in tree, they were of various quality and documentation. Example: floating ips were an extension (officially out of the API), but are used so extensively that they were a de facto part of the core API. Except, of course, they weren't, so people trying to use the API had a few things they could count on, and a ton of things that might or might not be there.
This was a disaster for the promise of interoperable clouds.
Even figuring out what a cloud could do was pretty terrible. You could approximate it by listing the extensions of the API, then having a bunch of logic in your code to realize which extensions turned on or off certain features, or added new data to payloads.
We took a very long journey to come up with a better way, with lots of wrong turns and dead ends. That's a story for another day. Today I'd like just explain where we got to, and why.
A User-First Perspective
Let's take a step back and think about some stakeholders involved in OpenStack, and what they need and want out of it.
Jackson the Absent
Last year Jackson wrote an application that works against OpenStack; it's been deployed in production and is part of the workflow at his organization.
Jackson has decided to leave tech and take up goat farming. His application should continue to work without changes even after the OpenStack cloud it's running against has been upgraded multiple times.
Emma the Active
Emma is the kind of person that loves new features and eagerly awaits her OpenStack cloud upgrades to see the new things she can do. She should be able to have her application work fine across upgrades, and then be able to modify it to take advantages of new features exposed in the API.
Sophia the Multi-Cloud Integrator
Sophia has an application that spans multiple OpenStack clouds run by different organizations. These clouds will be at different versions of OpenStack, and thus will expose different API features.
Sophia needs to be able to talk to all these clouds, know what features they expose, and have a single program that can talk to them all simultaneously.
(Note: this is what our own nodepool does that runs all the OpenStack upstream tests)
Aiden the Cloud Operator
Aiden knows he has a lot of users of the OpenStack API at his location. He'd really like to know who the Jacksons and Emmas of the world are, so that he can keep an eye on the far future of whether it's ever safe to disable really old features in his cloud.
Olivia the Contributor
Olivia wants to get her feature added to Nova which needs exposing through the API. She'd like to be able to get that landed during a release, and not have to wait 3 years for an eventual API rewrite.
Considering the needs of these various users helps to determine the key requirements for an API to something like OpenStack: an Open Source project that's deployed in many different companies and environments, with possibly years of difference in the version of code deployed at any of these locations.
- The API should be as common as possible between deploys. Every optional feature is a feature that can't be depended on by an application writer. Or worse: if it's used, it's a lock-in to that cloud. That means software has to be rewritten for every cloud, or written with a horrible kluge layer.
- It needs to be really clear exactly what a particular cloud's API supports.
- Older applications must not be broken by new features, or need to be rewritten after their OpenStack cloud is upgraded.
- We have to have a way to get new features out in a timely basis.
- We have to be able to evolve the API one piece at a time, as the Nova API is sufficiently large that a major version bump is no longer possible (we learned this the hard way).
The Backwards Compatibility Fallacy
Nearly every conversation with a developer around this issue starts with "why can't we just add more data to structures in backwards compatible ways". This is how service providers like Amazon, Meetup, and others work.
The problem is we aren't a proprietary company with 1 revision of our API stack in the wild at a time. We are an Open Source project with thousands of deployments all over the world, all on different code revisions and upgrade cadence.
Let's play the exercise where we thought additive was good enough.
A great example currently exists: ipv6 filtering on server lists. Currently, the Nova client erroneously says it's supported. It's not; it's actually completely ignored on the server. A suggestion is that this is a backwards compatible addition, so we should just do it, and we don't need to signal to the user that this was an API change.
However, that assumes that time is monotonically moving forward; it's not. Sophia might run across a cloud that had gotten to this version of the code and realized she could filter by ipv6 address. Great! She writes her code to use and depend on that feature.
Then she runs her code at against another cloud, which runs a version of Nova that predates this change. She's now effectively gone back in time. Her code now returns thousands of records instead of 1, and she's terribly confused why. She also has no way to figure out if random cloud Z is going to support this feature or not. So the only safe thing to do is implement the filtering client side instead, which means the server side filtering actually gained her very little. It's not something she can ever determine will work ahead of time. It's an API that is untrustworthy, so it's something that's best avoided.
There are some amazing things in the HTTP specification, some really great ideas that I am amazed were thought about back in the early days of the web. One of which is Content Negotiation. A resource is addressed by a URL (Uniform Resource Locator). However, that resource might be available in multiple representations: there might be a text version, an html version, and a pdf version. The HTTP spec provides a header that allows you to tell the server what kind of representation you would like for your resource. The server can say "that's not possible" and then you try again with something different, but it gives you as the client a lot of control in what you are going to get.
What if APIs worked like that? It's always a server, but I'd really like the 2.253 representation of it, which has some fields that are really handy for me.
Microversions are like Content Negotiation for the API.
Like Content Negotiation, the requested Microversion is passed as an HTTP header. Unlike Content Negotiation we don't support ranges, as the complexity to client programming gets out of control. Like Content Negotiation, if nothing is provided, we do a sane thing: send the minimum supported version.
Nova v2.1 is a new, cleaner backend implementation of the Nova v2.0 API. The one thing it adds is consistent input validation, so we catch bad requests at the API layer and return a sane error to the user. This is much more straight forward than our old model of trying to translate a stack trace (possibly triggered by a database violation) into a meaningful error message to the user.
Applications that work on v2.0 can be pointed to v2.1, and will just work. It should be transparent enough to the application authors that they'll never notice the transition.
And onto this Nova v2.1 API endpoint, we start adding microversion features. If you want features in the 2.3 microversion, you specify that fact in your header. Then you'll get the v2.3 versions of all the resources.
If you specify nothing, you get the minimum supported version, which rolls back to v2.1, which is the same as the v2.0 API. So all existing applications just work without doing anything. Only when an application wants to opt into new features does it need to make changes.
Solving for Stakeholders
Let's look at how this solves things for our stakeholders:
- Jackson: his application keeps running, v2.1 is v2.0. His application needed to make 0 changes to run as it did before.
- Emma: she can poll the Versions endpoint and discover that hercloud now supports 2.4. So she can start coding to those features in her application and put a 2.4 version request into all of her code.
- Sophia: she can now probe all of the clouds she's working with to find out what feature levels they support, based on the information provided by the Versions endpoint. As request version is per request, she can either figure out some API version that intersects all her clouds and write to that, or she can write client-side paths based on different versions she's sure she can support and has tested (a 2.1 path, a 2.4 path, a 2.52 path) and dynamically use the best path supported on a particular cloud. This approach works even after BleedingEdgeCo cloud has set a minimum supported version at 2.50, even though ImSlowWithUpgradesCo cloud still only is up to 2.4. Sophia's job was never going to be fun, but it's now possible, without building a giant autoconf-like system to probe and detect what clouds actually support, or worse: trying to piece it together from a range of service provider and product documentation.
- Aiden: he's now collecting client request version information on inbound requests, which means that he can figure out which of his users are still using older code. That provides the ability to know when, if ever, it's safe to move forward. Or even be able to have a chat with folks using Jackson's ancient tools to figure out what their long term support strategy should be.
- Olivia: she can now help evolve the Nova API in a safe way, knowing that she's not going to break existing users, but will still be able to support new things needed by OpenStack.
Nova v2.1/v2.0 Forever (nearly)
There are some details about how we implement microversions internally in Nova, which means our assumption is that we're supporting the base v2.1 API forever. We have the facility to raise the minimum version; however we've stated the first time we're even going to have that conversation is in Barcelona in Fall of 2016. That doesn't mean we'll raise the minimum in Orzo, but we'll have our first conversation, with lots of data from operators and application developers to see how things are going, and what's a realistic path moving forward.
One API - no more extensions
There were multiple ways we could have gone about microversioning; one of the original suggestions was versions per resource. But the moment you start thinking about what the client code would look like to talk to that, you want to throw up a little bit. It was clear that to provide the best user experience we needed to draw a line in the sand and stop thinking about the Nova API as a toolkit to extend, and declare it as a solid thing that all users can expect from their clouds.
The extensions mechanism is deprecated. All the extensions in the Nova tree are now in the Nova API. Over the next couple of cycles we'll be consolidating some of the code to make this more consistent, and eventually remove the possibility of out-of-tree extensions working at all. This allows the API to have a meaningful monotonically increasing API version that will mean the same thing across all deploys.
This is also a signal to people working on Nova that "innovating on the API out-of-tree" is a thing we not only don't find valuable, but is fundamentally hostile to the creation of an application ecosystem for OpenStack.
If you need new things, come talk to us. Let's figure out how to do it together, in tree, in an interop-friendly way. And yes, this means some features won't be welcomed, or will be delayed as we consider a way to make them something that could work for a majority of deployers / hypervisors, and that could be a contract we could support long-term.
Never another API endpoint
You should never expect another API endpoint from Nova. Our versioning mechanism is no longer about the endpoint name, it's the Nova API with a Microversion header in the request. Applications will never need to think about a big rewrite because they have to jump API endpoints. They can instead take advantage of new features on a time table that makes sense to them.
Microversioning is a new thing, but it's already shown quite a bit of promise. It's also been implemented by other projects like Ironic to address the same kinds of concerns that we saw. It's very exciting to see this spread across OpenStack.
This mechanism will also let us bring in things like JSON Home to expose the resource tree in Nova as a resource itself. And work on concepts like the Tasks API to provide a better workflow for creating multi-part resources in Nova (a server with networks and volumes that should get built as an atomic unit).
Discoverability is not yet fully solved, as the policy that applies to a user is still hidden. We hope that with some of the upcoming work on Dynamic Policy in Keystone we can get that built into the API as well. That will give us a complete story where an application can know what it can do against a given cloud before it gets started (as a combination of supported microversion, and exposed policy).
And there is a lot of internal cleaing up, documentation, and testing still to do. The bulk of the Liberty cycle API work is going to be there to put some polish on what we've got.
A big thanks to Chris Yeoh
The journey that got us here was a long and hard one. It started 5 cycles ago at the Grizzly summit. A year ago no one on the team was convinced we had a path forward. Really hard things are some times really hard.
Chris Yeoh was the point person for the API work through this whole time, and without his amazing patience and perseverance as we realized some terrible corners we'd painted ourselves into, and work that had to be dropped on the floor, we probably wouldn't have come out the other side with anything nearly as productive as what we now have. We lost him this spring at far too young an age. This work is part of his legacy, and will be foundational for much of OpenStack for a long time to come.
Thank you Chris, and we miss you.
Do you want to learn more?
If you some how made it this far, and want even more details about what's going on the following is great follow up reading: