Category Archives: Conference

REST API Microversions

This is the version of the talk I gave at API Strat back in November about OpenStack’s API microversions implementation, mostly trying to frame the problem statement. The talk was only 20 minutes long, so you can only get so far into explaining the ramifications of the solution.

REST_API_MV-0

This talk dives into an interesting issue we ran into with OpenStack, an open source project that is designed to be deployed by public and private clouds, and exposes a REST API that users will consume directly. But in order to understand why we had to do something new, it’s first important to understand basic assumptions on REST API versioning, and where those break down.

REST_API_MV-1

There are some generally agreed to rules with REST API Versioning. They are that additive operations, like adding a key, shouldn’t break clients, because clients shouldn’t care about extra data that they get back. If you add new code that adds a new attribute, like description, you can make these changes, roll them out to the user, they get an extra thing and life is good.

This mostly work as long as you have a single instance, so new attributes show up “all at once” for users, and that you don’t rollback an API change after it’s been in production for any real length of time.

REST_API_MV-2

This itself is additive, you can keep adding more attributes over time. Lots of public web services use this approach. And this mostly just works.

REST_API_MV-3

It’s worth thinking for a moment about the workflow that supports this kind of approach. There is some master branch where features are being written and enabled, and at key points in time these features are pushed to production, where those features show up for users. Basic version control, nothing too exciting here.

REST_API_MV-4

But what does this look like in an Open Source project? In open source we’ve got a master upstream branch, and perhaps stable releases that happen from time to time. Here we’ll get specific and show the OpenStack release structure. OpenStack creates a stable release every 6 months, and main development continues on in git master. These are named with letters of the alphabet, so Liberty, Mitaka, Newton, Ocata, Pike being the last 5 releases.

REST_API_MV-5

But releasing stable code doesn’t get it into the hands of users. Clouds have to deploy that code. Some may do that right away, others may take a while. Exactly when a release gets into the users hands is an unknown. This gets even more tricky when you consider private clouds that are consuming something like a Linux Distro’s version of OpenStack. There is a delay getting into the distro, then another delay about when they decide to upgrade.

REST_API_MV-6

End users have no visibility into the versions of OpenStack that operators have deployed, they only know about the API. So when they are viewing the world across clouds at a specific point in time (T1, T2, or T3) they will experience different versions of the API.

REST_API_MV-7

Let’s take that T3 example. If a user starts by writing their software to Cloud A, the description field is there. They are going to assume that’s just part of the base API. They then connect their software to Cloud B, and all is fine. But, when later in the week they point it at Cloud C, the API has magically “removed” attributes. Removing attributes in the server is never considered safe.

This might blow up immediately, or it might just carry a null value that fails very late in the processing of the data.

REST_API_MV-8

The lesson here is that the assumed good enough rules don’t account for a different team developing software than deploying it. Sure, you say, that’s not good Dev Ops, of course it’s not supported. But be careful with what you are saying there, because we deploy software we don’t develop all the time, 3rd party open source. And if you’ve come down firmly that open source is not part of Dev Ops, I think a lot of people would look at you a bit funny.

REST_API_MV-9

I’m responsible for two names that took off in OpenStack, one is Microversions. Like any name that catches on you regret it later because it misses the important subtlety of what is going on. Don’t ask me to name things.

But besides the name, what are microversions?

REST_API_MV-10

Let’s look at that example again, if we experience the world at time T3, with Clouds A, B, and C, the real issue is that hitting Cloud C appears to make time “go backwards”. We’ve gone back in time and experienced the software at a much earlier version. How do we avoid going backwards?

REST_API_MV-11

We introduce a header for the “microversion” we want. If we don’t pass one, we get the minimum version the server supports. Everything after that we have to opt into. If we ask for a microversion the server doesn’t support we get a hard 400 fail on the request. This lets us fail early, which is more developer friendly than giving back unexpected data which might corrupt things much later in the system.

REST_API_MV-12

Roughly, microversions are inspired by HTTP content negotiation, where you can ask for different types of content from the same url, and the server will give you the best it can (you define “best” with a quality value). Because most developers implementing REST clients aren’t deeply knowledgeable about HTTP low level details, we went for simplicity and did this with a dedicated header. For simplicity we also make this a globally incrementing value across all resources, instead of per resource. We wanted there to be no confusion what version 2.5 was.

The versions that a service supports are discoverable by hitting the root document of the API service. The other important note is that services are expected to continue to support old versions for a very long time. In Nova today we’re up to about 2.53, and we still support everything back down to 2.0. That represents about 2.5 years of API changes.

There are a lot more details on the justification here for the approach, but not enough time today to go into them. If you want to learn more, I’ve got a blog writeup from when we first did that that dives in pretty deep, including showing the personas you’d expect to interact with this system.

REST_API_MV-13

Thus far this has worked pretty well. About 1/2 the base services in OpenStack have implemented a version of this. Most are pretty happy with the results. That version document I talked about can be seen here.

There are open questions for the future, mostly around raising minimum versions. No one has done it yet, though there are some thoughts about how you do that.

REST_API_MV-14

Since I’m here, and there are a lot of OpenAPI experts around, I wanted to talk just briefly about OpenStack and OpenAPI.

REST_API_MV-15

The OpenStack APIs date back to about 2010, when the state of the art for doing REST APIs was WADL. A now long dead proposed specification by Sun Microsystems. Lots of XML. But, that was the constraints at the time, which are different constraints than OpenAPI.

One of those issues is our actions API, where we use the same url with very different payloads to do non-RESTy function calls. Like reboot a server. The other is Microversions, which don’t have any real way to make to OpenAPI without using vendor extensions, at which point you loose most of the interesting tooling in the ecosystem.

There is an open question in my mind about whether the microversion approach is interesting enough that it’s something we could consider for OpenAPI. OpenStack could easily microversion itself out of the actions API to something more OpenAPI friendly, but without microversion support there isn’t much point.

REST_API_MV-16

There was a talk yesterday about “Never have a breaking API change again”, which followed about 80% of our story, but didn’t need something like microversions because it was Azure, and they controlled when code got deployed to users.

There are very specific challenges for Open Source projects that expose a public web services API, and expect to be deployed directly by cloud providers. We’re all used to open source behind the scenes, and plumbing to our services. But Open Source is growing into more areas. It is our services now. With things like OpenStack, Kubernetes, OpenWhisk… Open Source projects now are defining that user consumable API. If we don’t come up with common patterns for how to handle it then we’re putting Open Source at a disadvantage.

I’ve been involved in Open Source for close to two decades, and I strongly believe we should make Open Source able to play on the same playing field as proprietary services. The only way we can do this is think about if our tools and standards support Open Source all the way out to the edge.

Questions

Question 1: How long do you expect to have to keep the old code around, and how bad it is to manage that?

Answer: Definitely a long time, minimum a few years. The implementation of all of this is in python and we’ve got a bunch of pretty good decorators and documentation that makes it pretty easy to compartmentalize the code. No one has yet lifted a minimum version, as the amount of work to support the old code hasn’t really been burdensome as of yet. We’ll see how that changes in the future.

Question 2: Does GraphQL solve this problem in a way that you wouldn’t need microversions?

Answer: That’s a very interesting question. When we got started GraphQL was pretty nascent so not really on our radar. I’ve spent some time looking at GraphQL recently, and I think the answer is “yes, in theory, no in practice”, and this is why.

Our experience with the OpenStack API over the last 7 years is no one consumes your API directly. They almost always use some 3rd party SDK in their programming language of choice that gives them nice language bindings, and feels like their language of choice. GraphQL is great when you are going to think about your interaction with the service in a really low level way, and ask only for the minimal data you need to do your job. But these SDK writers don’t know what you need, so when they build their object models, they just do so by putting everything in. At which point you are pretty much back to where we started.

I think GraphQL is going to work out really well for really popular services (like github) where people are willing to take the hit to go super low level. Or where you know the backend details well enough to understand the cost differential of asking for different attributes. But I don’t think it obviates trying to come up with a server side versioning for APIs in Open Source.

 

 

Notes from North Bay Python

North Bay Python marqee at the Mystic Theatre in Petaluma, CA

I had the pleasure of attending the first North Bay Python conference in Petaluma, CA this past weekend. IBM was a sponsor, and I gave a few quick remarks about doing python serverless actions on OpenWhisk. My two days there were full of wonderful talks and interactions with a slice of the python community.

One of the reasons I love low cost (to attendee) regional conferences like North Bay Python is that it makes technology conferences more accessible. For 40% of the 250 attendees, this was the first technology conference they’d ever gone to. Not everyone lives in New York City or San Francisco (or wants to), and having local events is critical to expanding the range of voices in the technology community.

There were tons of great talks, you can watch them all here. But I’ll highlight a few moments that I’ll keep with me for a while.

Fortran on stage

For a single track Python conference, we actually got to see FORTRAN in 2 different talks. It’s probably more FORTRAN than I’ve ever read before.

Catherine Moroney is part of the team that does analysis of satellite images from the LandSat program. They’ve got a lot of optimized FORTRAN and C code for processing these images. But FORTAN and C aren’t great languages for writing new features to orchestrate these lower level transforms. She uses Python to do this work, and can seamlessly pass data back and forth from Python to FORTRAN for data crunching. It was great to see how and when a hybrid approach like this makes the developers much more effective.

Christopher Swenson tackled FORTRAN from the other side. He hacked together a FORTAN IV interpretter in Python, so that he could run Colossal Cave Adventure (originally written for the PDP-11) as a text message game using the Twilio API. His talk wandered through some of the interesting quirks of now extinct programming languages, and the systems they were written for. This is a world before ASCII as we know it became a standard, and the idea of 32bit integers really hadn’t emerged. 36bit integers were used to store 5, 7bit characters, which were later assembled into text streams.

Through the whole thing he showed snippets of FORTRAN that he had to make guesses about what it really meant, as well as be really frank on shortcuts he made to get things to work. There is no more FORTRAN IV code in the world, this didn’t have to be a perfect emulator, it just had to run this one single FORTRAN IV program well enough to connect it to the internet.

You can play this by texting to +1 (669) 238-3683 right now if you want to see it in action.

Twitter Bots

Tweet: "What is Machine Learning? Easy! Machine Learning is how you automate your biases so you can think about them even less."

My vote for most hilarious talk was Benno Rice‘s dive into writing twitter bots. He started with pretty easy template base bots, like one producing plausible plot lines for Mid Summer Murders. This is just a stack of well crafted arrays and a random number generator.

Things got more interesting when he started diving into Markov Chain bots. Especially where you take content from a bunch of different sources. It’s really easy for that to just become word salad at worst, or just confusing and “meh”. He found you had to keep playing with the content mix as well as the overlap parameters to get the bots to generate something that’s amusing at least some of the time. The bots he’s got he doesn’t let post directly, content is generated offline, and he pushes the good ones after manual review.

Benno also took a dive down the path trying to do machine learning to make these better, but mostly got worse results in his experiments. But, the story of that not working out was funny all by itself. The real lesson here is that playfulness is useful in learning some new things, and that Twitter bots are a lot of fun to build.

Search First Documentation

My vote for most takeaways that I’ll personally use goes to Heidi Waterhouse for “Search-First Writing for Developers“. Recently there as a mass migration of OpenStack Documentation from a dedicated docs team to all the development teams.

The heart of her message is that to any first approximation, no one reads your documentation. Users end up at your documentation when they have a problem with your software, so they are showing up a) grumpy, and b) through whatever Google terms they could guess for their problem. They are not coming through your carefully curated table of contents, they are coming from Google, and then they are skimming to find their answer. The won’t follow links to other pages, this is where they are.

What that means is you need to treat every page as the only page that the user will ever see, you need to optimize your content for skimming, and you need to answer problems people actually have, not the ones you think they might have. Getting real analytics on how folks are reading your docs, and the search terms they are coming in with, is an important part of this.

Hearing all these harsh and practical words from someone that spent 15 years as a technical content author was really enlightening. I’ll definitely have to watch this talk again and digest more of Heidi’s lessons.

Safe Spaces

Reporting guidelines for Safety Incidents

One of the welcome trends that I’ve seen at tech conferences over the last 5 years is a real focus on a strict Code of Conduct, clear reporting guidelines, and making sure that folks feel safe at these events. North Bay Python did a great job on that front, and that commitment definitely was reflected in a pretty diverse speaker lineup and attendee base.

The effort they went to was highlighted further by Seán Hanson’s talk on Quiet Developers. We’ve long known that while diversity in Tech is much lower than national averages, it’s ever worse in Open Source Software. A big reason for this is members of traditionally marginalized communities really don’t feel safe in these environments, or may not have the spare time to devote outside of their normal day jobs. It doesn’t mean they aren’t great developers, it’s just that current systems are optimized for loudness as much as talent. Seán’s whole talk was ways to engage and get the most out of your quiet developers, and give them what they need to really succeed. While I did need to leave about the time this talk started, I stuck around and watched from the balcony. His message was really powerful and really important to how we all evolve the tech community going forward.

Double A Plus, Would Come Again

North Bay Python was definitely worth the trip. It had a few normal quirks of a first time conference on scheduling. Being Petaluma, the Theatre didn’t actually have heat, so the first few hours the first day were a bit cold in there. But it warmed up pretty quickly with 250 bodies. The biggest issue in my mind was there wasn’t much common space outside of the theatre, so a hallway track wasn’t really a thing. It would have been nice to have a bit more milling about time to get to know folks there, and ask follow up questions of speakers.

But all in all a great time. Looking forward to seeing how they do next year.

 

Notes from API Strat

Back in November I had the pleasure to attend API Strat for the first time. It was 2 days of short (20 minute) sessions running in 3 tracks with people discussing web service API design, practice, and related topics. My interest was to get wider exposure to the API Microversions work that we did in OpenStack, and get out of that bubble to see what else was going on in the space.

Events on the Web

Event technologies being used by different web services
Event technologies being used by different web services

There were lots of talks that brought up the problem of getting real time events back to clients. Clients talking to servers is a pretty solved problem with RESTful interfaces. But the other way is far from a solved item. The 5 leading contenders are Webhooks (over http), HTTP long polling, Web Sockets, AMQP, and MQTT. Each has their boosters, and their place, but this is going to be a messy space for the next few years.

OpenAPI’s version 3 specification includes webhooks, though with version 3 there is no simultaneously launched tooling. It will take some time before people build implementations around that. That’s a boost in that direction. Nginx is adding MQTT proxy support. That’s a boost in that direction.

Webhooks vs. Serverless

Speaking of webhooks, the keynote from Glenn Block of Auth0 brought up an interesting point: serverless effectively lives in the eventing space as well.

Webhooks are all fine and good to make your platform efficient and scalable. If clients now have to run their own redundant highly available services to catch events, that’s a lot of work, and many will just skip it. The found that once they build out a serverless platform where they could host their clients code, they got much more uptake on their event API. And, more importantly, they found that their power user customers were actually building out important features of their platform. He made a good case that every online service should really be considering an embedded serverless environment.

API Microversions

I was ostensibly there to talk about API Microversions, an approach we did in OpenStack to handle the fact that deployments of OpenStack upgrade at very different cadences. The talk went pretty well.

20 minutes was a challenge to explain something that took us all 6 months to get our heads around. I do think I managed to communicate the key challenge: when you build an open source system with a user facing API, how do users control what they get?  A lot of previous “good enough” rules fall down.

Darrel Miller had given a talk “How to never make another breaking API change“. His first 5 minutes were really similar to mine, and then, because this was about Azure, with a single controlled API instance, the solution veered in a different direction. It was solid reinforcement for that fact that we were on the right path here, and that the open source solution has a different additional constraint.

One of the key questions I got in Q&A is one I’d been thinking about. Does GraphQL make this all obsolete? GraphQL was invented by Facebook to get away from the HTTP GET/POST model of passing data around, and let you specify a pretty structured query about the data you need from the server. On paper, it solves a similar problem as microversions, because it if you are really careful with your GraphQL you can ask for the minimum data you need, and are unlikely to get caught by things coming and going in the API. However, in practice, I’m not convinced it would work. In OpenStack we saw that most API usage was not raw API calls, it was through an SDK provided by someone in the community. If you are an SDK writer, it’s a lot harder to make assumptions about what parts of objects people want, so you’d tend to return everything. And that puts you right back with the same problem we have in REST in OpenStack.

API Documentation

There were many talks on better approaches for documentation, which resonated with my after the great OpenStack docs migration.

Taylor Barnett’s talk “Things I Wish People Told Me About Writing Docs” was one of my favorites. It included real user studies on what people actually read in documentation. It turns out that people don’t read your API documentation, they skim hard. They will read your code snippets as long as they aren’t too long. But they won’t read the paragraph before it, so if there is something really critical about the code, make it a comment in the code snippet itself. There was also a great cautionary tale to stop using phases like “can be easily done”. People furiously hunting around your site trying to get code working are ramping up quick. Words like “easy” make them feel dumb and frustrated when they don’t get it working on the first try. Having a little more empathy for the state of mind of the user when they show up goes a long way towards building a relationship with them, and making them more bought into your platform.

Making New Friends

I also managed to have an incredible dinner the first night I was in town setup by my friend Chris Aedo. Both the food and conversation were amazing, in which I learned about Wordnic, distributed data systems, and that you can loose a year of research because ferrits bread for specific traits might be too dumb to be trained.

Definitely a lovely conference, and one I hope to make it back to next year.