Tag Archives: apistrat

REST API Microversions

This is the version of the talk I gave at API Strat back in November about OpenStack's API microversions implementation, mostly trying to frame the problem statement. The talk was only 20 minutes long, so you can only get so far into explaining the ramifications of the solution.

REST_API_MV-0

This talk dives into an interesting issue we ran into with OpenStack, an open source project that is designed to be deployed by public and private clouds, and exposes a REST API that users will consume directly. But in order to understand why we had to do something new, it's first important to understand basic assumptions on REST API versioning, and where those break down.

REST_API_MV-1

There are some generally agreed to rules with REST API Versioning. They are that additive operations, like adding a key, shouldn't break clients, because clients shouldn't care about extra data that they get back. If you add new code that adds a new attribute, like description, you can make these changes, roll them out to the user, they get an extra thing and life is good.

This mostly work as long as you have a single instance, so new attributes show up "all at once" for users, and that you don't rollback an API change after it's been in production for any real length of time.

REST_API_MV-2

This itself is additive, you can keep adding more attributes over time. Lots of public web services use this approach. And this mostly just works.

REST_API_MV-3

It's worth thinking for a moment about the workflow that supports this kind of approach. There is some master branch where features are being written and enabled, and at key points in time these features are pushed to production, where those features show up for users. Basic version control, nothing too exciting here.

REST_API_MV-4

But what does this look like in an Open Source project? In open source we've got a master upstream branch, and perhaps stable releases that happen from time to time. Here we'll get specific and show the OpenStack release structure. OpenStack creates a stable release every 6 months, and main development continues on in git master. These are named with letters of the alphabet, so Liberty, Mitaka, Newton, Ocata, Pike being the last 5 releases.

REST_API_MV-5

But releasing stable code doesn't get it into the hands of users. Clouds have to deploy that code. Some may do that right away, others may take a while. Exactly when a release gets into the users hands is an unknown. This gets even more tricky when you consider private clouds that are consuming something like a Linux Distro's version of OpenStack. There is a delay getting into the distro, then another delay about when they decide to upgrade.

REST_API_MV-6

End users have no visibility into the versions of OpenStack that operators have deployed, they only know about the API. So when they are viewing the world across clouds at a specific point in time (T1, T2, or T3) they will experience different versions of the API.

REST_API_MV-7

Let's take that T3 example. If a user starts by writing their software to Cloud A, the description field is there. They are going to assume that's just part of the base API. They then connect their software to Cloud B, and all is fine. But, when later in the week they point it at Cloud C, the API has magically "removed" attributes. Removing attributes in the server is never considered safe.

This might blow up immediately, or it might just carry a null value that fails very late in the processing of the data.

REST_API_MV-8

The lesson here is that the assumed good enough rules don't account for a different team developing software than deploying it. Sure, you say, that's not good Dev Ops, of course it's not supported. But be careful with what you are saying there, because we deploy software we don't develop all the time, 3rd party open source. And if you've come down firmly that open source is not part of Dev Ops, I think a lot of people would look at you a bit funny.

REST_API_MV-9

I'm responsible for two names that took off in OpenStack, one is Microversions. Like any name that catches on you regret it later because it misses the important subtlety of what is going on. Don't ask me to name things.

But besides the name, what are microversions?

REST_API_MV-10

Let's look at that example again, if we experience the world at time T3, with Clouds A, B, and C, the real issue is that hitting Cloud C appears to make time "go backwards". We've gone back in time and experienced the software at a much earlier version. How do we avoid going backwards?

REST_API_MV-11

We introduce a header for the "microversion" we want. If we don't pass one, we get the minimum version the server supports. Everything after that we have to opt into. If we ask for a microversion the server doesn't support we get a hard 400 fail on the request. This lets us fail early, which is more developer friendly than giving back unexpected data which might corrupt things much later in the system.

REST_API_MV-12

Roughly, microversions are inspired by HTTP content negotiation, where you can ask for different types of content from the same url, and the server will give you the best it can (you define "best" with a quality value). Because most developers implementing REST clients aren't deeply knowledgeable about HTTP low level details, we went for simplicity and did this with a dedicated header. For simplicity we also make this a globally incrementing value across all resources, instead of per resource. We wanted there to be no confusion what version 2.5 was.

The versions that a service supports are discoverable by hitting the root document of the API service. The other important note is that services are expected to continue to support old versions for a very long time. In Nova today we're up to about 2.53, and we still support everything back down to 2.0. That represents about 2.5 years of API changes.

There are a lot more details on the justification here for the approach, but not enough time today to go into them. If you want to learn more, I've got a blog writeup from when we first did that that dives in pretty deep, including showing the personas you'd expect to interact with this system.

REST_API_MV-13

Thus far this has worked pretty well. About 1/2 the base services in OpenStack have implemented a version of this. Most are pretty happy with the results. That version document I talked about can be seen here.

There are open questions for the future, mostly around raising minimum versions. No one has done it yet, though there are some thoughts about how you do that.

REST_API_MV-14

Since I'm here, and there are a lot of OpenAPI experts around, I wanted to talk just briefly about OpenStack and OpenAPI.

REST_API_MV-15

The OpenStack APIs date back to about 2010, when the state of the art for doing REST APIs was WADL. A now long dead proposed specification by Sun Microsystems. Lots of XML. But, that was the constraints at the time, which are different constraints than OpenAPI.

One of those issues is our actions API, where we use the same url with very different payloads to do non-RESTy function calls. Like reboot a server. The other is Microversions, which don't have any real way to make to OpenAPI without using vendor extensions, at which point you loose most of the interesting tooling in the ecosystem.

There is an open question in my mind about whether the microversion approach is interesting enough that it's something we could consider for OpenAPI. OpenStack could easily microversion itself out of the actions API to something more OpenAPI friendly, but without microversion support there isn't much point.

REST_API_MV-16

There was a talk yesterday about "Never have a breaking API change again", which followed about 80% of our story, but didn't need something like microversions because it was Azure, and they controlled when code got deployed to users.

There are very specific challenges for Open Source projects that expose a public web services API, and expect to be deployed directly by cloud providers. We're all used to open source behind the scenes, and plumbing to our services. But Open Source is growing into more areas. It is our services now. With things like OpenStack, Kubernetes, OpenWhisk... Open Source projects now are defining that user consumable API. If we don't come up with common patterns for how to handle it then we're putting Open Source at a disadvantage.

I've been involved in Open Source for close to two decades, and I strongly believe we should make Open Source able to play on the same playing field as proprietary services. The only way we can do this is think about if our tools and standards support Open Source all the way out to the edge.

Questions

Question 1: How long do you expect to have to keep the old code around, and how bad it is to manage that?

Answer: Definitely a long time, minimum a few years. The implementation of all of this is in python and we've got a bunch of pretty good decorators and documentation that makes it pretty easy to compartmentalize the code. No one has yet lifted a minimum version, as the amount of work to support the old code hasn't really been burdensome as of yet. We'll see how that changes in the future.

Question 2: Does GraphQL solve this problem in a way that you wouldn't need microversions?

Answer: That's a very interesting question. When we got started GraphQL was pretty nascent so not really on our radar. I've spent some time looking at GraphQL recently, and I think the answer is "yes, in theory, no in practice", and this is why.

Our experience with the OpenStack API over the last 7 years is no one consumes your API directly. They almost always use some 3rd party SDK in their programming language of choice that gives them nice language bindings, and feels like their language of choice. GraphQL is great when you are going to think about your interaction with the service in a really low level way, and ask only for the minimal data you need to do your job. But these SDK writers don't know what you need, so when they build their object models, they just do so by putting everything in. At which point you are pretty much back to where we started.

I think GraphQL is going to work out really well for really popular services (like github) where people are willing to take the hit to go super low level. Or where you know the backend details well enough to understand the cost differential of asking for different attributes. But I don't think it obviates trying to come up with a server side versioning for APIs in Open Source.

 

 

Notes from API Strat

Back in November I had the pleasure to attend API Strat for the first time. It was 2 days of short (20 minute) sessions running in 3 tracks with people discussing web service API design, practice, and related topics. My interest was to get wider exposure to the API Microversions work that we did in OpenStack, and get out of that bubble to see what else was going on in the space.

Events on the Web

Event technologies being used by different web services
Event technologies being used by different web services

There were lots of talks that brought up the problem of getting real time events back to clients. Clients talking to servers is a pretty solved problem with RESTful interfaces. But the other way is far from a solved item. The 5 leading contenders are Webhooks (over http), HTTP long polling, Web Sockets, AMQP, and MQTT. Each has their boosters, and their place, but this is going to be a messy space for the next few years.

OpenAPI's version 3 specification includes webhooks, though with version 3 there is no simultaneously launched tooling. It will take some time before people build implementations around that. That's a boost in that direction. Nginx is adding MQTT proxy support. That's a boost in that direction.

Webhooks vs. Serverless

Speaking of webhooks, the keynote from Glenn Block of Auth0 brought up an interesting point: serverless effectively lives in the eventing space as well.

Webhooks are all fine and good to make your platform efficient and scalable. If clients now have to run their own redundant highly available services to catch events, that's a lot of work, and many will just skip it. The found that once they build out a serverless platform where they could host their clients code, they got much more uptake on their event API. And, more importantly, they found that their power user customers were actually building out important features of their platform. He made a good case that every online service should really be considering an embedded serverless environment.

API Microversions

I was ostensibly there to talk about API Microversions, an approach we did in OpenStack to handle the fact that deployments of OpenStack upgrade at very different cadences. The talk went pretty well.

20 minutes was a challenge to explain something that took us all 6 months to get our heads around. I do think I managed to communicate the key challenge: when you build an open source system with a user facing API, how do users control what they get?  A lot of previous "good enough" rules fall down.

Darrel Miller had given a talk "How to never make another breaking API change". His first 5 minutes were really similar to mine, and then, because this was about Azure, with a single controlled API instance, the solution veered in a different direction. It was solid reinforcement for that fact that we were on the right path here, and that the open source solution has a different additional constraint.

One of the key questions I got in Q&A is one I'd been thinking about. Does GraphQL make this all obsolete? GraphQL was invented by Facebook to get away from the HTTP GET/POST model of passing data around, and let you specify a pretty structured query about the data you need from the server. On paper, it solves a similar problem as microversions, because it if you are really careful with your GraphQL you can ask for the minimum data you need, and are unlikely to get caught by things coming and going in the API. However, in practice, I'm not convinced it would work. In OpenStack we saw that most API usage was not raw API calls, it was through an SDK provided by someone in the community. If you are an SDK writer, it's a lot harder to make assumptions about what parts of objects people want, so you'd tend to return everything. And that puts you right back with the same problem we have in REST in OpenStack.

API Documentation

There were many talks on better approaches for documentation, which resonated with my after the great OpenStack docs migration.

Taylor Barnett's talk "Things I Wish People Told Me About Writing Docs" was one of my favorites. It included real user studies on what people actually read in documentation. It turns out that people don't read your API documentation, they skim hard. They will read your code snippets as long as they aren't too long. But they won't read the paragraph before it, so if there is something really critical about the code, make it a comment in the code snippet itself. There was also a great cautionary tale to stop using phases like "can be easily done". People furiously hunting around your site trying to get code working are ramping up quick. Words like "easy" make them feel dumb and frustrated when they don't get it working on the first try. Having a little more empathy for the state of mind of the user when they show up goes a long way towards building a relationship with them, and making them more bought into your platform.

Making New Friends

I also managed to have an incredible dinner the first night I was in town setup by my friend Chris Aedo. Both the food and conversation were amazing, in which I learned about Wordnic, distributed data systems, and that you can loose a year of research because ferrits bread for specific traits might be too dumb to be trained.

Definitely a lovely conference, and one I hope to make it back to next year.