Syncing Sieve Rules in Fastmail, the hard way

I've been hosting my email over at Fastmail for years, and for the most part the service is great. The company understands privacy, contributes back to open source, and is incredibly reliable. One of the main reasons I moved off of gmail was their mail filtering system was not fine grained enough to deal with my email stream (especially open source project emails). Fastmail supports sieve, which lets you write quite complex filtering rules. There was only one problem, syncing those rules.

My sieve rules are currently just north of 700 lines. Anything that complex is something that I like to manage in git, so that if I mess something up, it's easy to revert to known good state.

No API for Sieve

Fastmail does not support any kind of API for syncing Sieve rules. There is an official standard for this, called MANAGESIEVE, but the technology stack Fastmail uses doesn't support it. I've filed tickets over the years that mostly got filed away as future features.

When I first joined Fastmail, their website was entirely classic html forms. Being no slouch, I had a python mechanize script that would log in as me, then navigate to the upload form, and submit it. This worked well for years. I had a workflow where I'd make a sieve change, sync via script, see that it generated no errors, then commit. I have 77 commits to my sieve rules repository going back to 2013.

But, a couple of years ago the Fastmail team refreshed their user interface to a Javascript based UI (called Overture). It's a much nicer UI, but it means it only works with a javascript enabled browser. Getting to the form box where I can upload my sieve rules is about 6 clicks. I stopped really tweaking the rules regularly because of the friction of updating them through clear / copy / paste.

Using Selenium for unintended purposes

Selenium is pretty amazing web test tool. It gives you an API to drive a web browser remotely. With recent versions of Chrome, there is even a headless chrome driver, so you can do this without popping up a graphics window. You can drive this all from python (or your language of choice).

An off hand comment by Nibz about using Selenium for something no one intended got me thinking: could I manage to get this to do my synchronization?

Answer, yes. Also, this is one of the goofiest bits of code that I've ever written.

Basic Flow

I won't do a line by line explanation, but there are a few concepts that make the whole thing fall in line.

The first is the use of WebDriverWait. This is an OvertureJS application, which means that clicking parts of the screen trigger an ajax interaction, and it may be some time before the screen "repaints". This could be a new page, a change to the existing page, an element becoming visible. Find a thing, click a thing, wait for the next thing. There is a 5 click interaction before I get to the sieve edit form, then a save button click to finish it off.

Finding things is important, and sometimes hard. Being an OvertureJS application, div ids are pretty much useless. So I stared a lot in Chrome inspector at what looked like stable classes to find the right things to click on. All of those could change with new versions of the UI, so this is fragile at best. Some times you just have to count, like finding the last textarea on the Rules page. Some times you have to inspect elements, like looking through all the buttons on a page to find the one that says "Save".

Filling out forms is done with sendKeys, which approximates typing by sending 1 character every few milliseconds. If you run non headless it makes for amusing animation. My sieve file is close to 20,000 characters, so this takes more than a full minute to put that content in one character at a time. But at least it's a machine, so no typos.

The Good and the Bad

The good thing is this all seems to work, pretty reliably. I've been running it for the last week and all my changes are getting saved correctly.

The bad things are you can't have 2 factor enabled and use this, because unlike things like IMAP where you can provision an App password for Fastmail, this is really logging in and pretending to be you clicking through the website and typing. There are no limited users for that.

It's also slow. A full run takes

It's definitely fragile, I'm sure an update to their site is going to break it. And then I'll be in Chrome inspector again to figure out how to make this work.

But, on the upside, this let me learn a more general purpose set of tools for crawling and automating the modern web (which requires javascript). I've used this technique for a few sites now, and it's a good technique to add to your bag of tricks.

The Future

Right now this script is in the same repo as my rules. This also requires setting up the selenium environment and headless chrome, which I've not really documented. I will take some time to split this out on github so others could use it.

I would love it if Fastmail would support MANAGESIEVE, or have an HTTP API to fetch / store sieve rules. Anything where I could use a limited app user instead of my full user. I really want to delete this code and never speak of it again, but a couple of years and closed support tickets later, and this is the best I've got.

If you know someone in Fastmail engineering and can ask them about having a supported path to programatically update sieve rules, that would be wonderful. I know a number of software developers that have considered the switch to Fastmail, but stopped when the discovered that updating sieve can only be done in the webui.

Updated (12/15/2017): via Twitter the Fastmail team corrected me that it's not Angular, but their own JS toolkit called OvertureJS. The article has been corrected to reflect that.

 

REST API Microversions

This is the version of the talk I gave at API Strat back in November about OpenStack's API microversions implementation, mostly trying to frame the problem statement. The talk was only 20 minutes long, so you can only get so far into explaining the ramifications of the solution.

REST_API_MV-0

This talk dives into an interesting issue we ran into with OpenStack, an open source project that is designed to be deployed by public and private clouds, and exposes a REST API that users will consume directly. But in order to understand why we had to do something new, it's first important to understand basic assumptions on REST API versioning, and where those break down.

REST_API_MV-1

There are some generally agreed to rules with REST API Versioning. They are that additive operations, like adding a key, shouldn't break clients, because clients shouldn't care about extra data that they get back. If you add new code that adds a new attribute, like description, you can make these changes, roll them out to the user, they get an extra thing and life is good.

This mostly work as long as you have a single instance, so new attributes show up "all at once" for users, and that you don't rollback an API change after it's been in production for any real length of time.

REST_API_MV-2

This itself is additive, you can keep adding more attributes over time. Lots of public web services use this approach. And this mostly just works.

REST_API_MV-3

It's worth thinking for a moment about the workflow that supports this kind of approach. There is some master branch where features are being written and enabled, and at key points in time these features are pushed to production, where those features show up for users. Basic version control, nothing too exciting here.

REST_API_MV-4

But what does this look like in an Open Source project? In open source we've got a master upstream branch, and perhaps stable releases that happen from time to time. Here we'll get specific and show the OpenStack release structure. OpenStack creates a stable release every 6 months, and main development continues on in git master. These are named with letters of the alphabet, so Liberty, Mitaka, Newton, Ocata, Pike being the last 5 releases.

REST_API_MV-5

But releasing stable code doesn't get it into the hands of users. Clouds have to deploy that code. Some may do that right away, others may take a while. Exactly when a release gets into the users hands is an unknown. This gets even more tricky when you consider private clouds that are consuming something like a Linux Distro's version of OpenStack. There is a delay getting into the distro, then another delay about when they decide to upgrade.

REST_API_MV-6

End users have no visibility into the versions of OpenStack that operators have deployed, they only know about the API. So when they are viewing the world across clouds at a specific point in time (T1, T2, or T3) they will experience different versions of the API.

REST_API_MV-7

Let's take that T3 example. If a user starts by writing their software to Cloud A, the description field is there. They are going to assume that's just part of the base API. They then connect their software to Cloud B, and all is fine. But, when later in the week they point it at Cloud C, the API has magically "removed" attributes. Removing attributes in the server is never considered safe.

This might blow up immediately, or it might just carry a null value that fails very late in the processing of the data.

REST_API_MV-8

The lesson here is that the assumed good enough rules don't account for a different team developing software than deploying it. Sure, you say, that's not good Dev Ops, of course it's not supported. But be careful with what you are saying there, because we deploy software we don't develop all the time, 3rd party open source. And if you've come down firmly that open source is not part of Dev Ops, I think a lot of people would look at you a bit funny.

REST_API_MV-9

I'm responsible for two names that took off in OpenStack, one is Microversions. Like any name that catches on you regret it later because it misses the important subtlety of what is going on. Don't ask me to name things.

But besides the name, what are microversions?

REST_API_MV-10

Let's look at that example again, if we experience the world at time T3, with Clouds A, B, and C, the real issue is that hitting Cloud C appears to make time "go backwards". We've gone back in time and experienced the software at a much earlier version. How do we avoid going backwards?

REST_API_MV-11

We introduce a header for the "microversion" we want. If we don't pass one, we get the minimum version the server supports. Everything after that we have to opt into. If we ask for a microversion the server doesn't support we get a hard 400 fail on the request. This lets us fail early, which is more developer friendly than giving back unexpected data which might corrupt things much later in the system.

REST_API_MV-12

Roughly, microversions are inspired by HTTP content negotiation, where you can ask for different types of content from the same url, and the server will give you the best it can (you define "best" with a quality value). Because most developers implementing REST clients aren't deeply knowledgeable about HTTP low level details, we went for simplicity and did this with a dedicated header. For simplicity we also make this a globally incrementing value across all resources, instead of per resource. We wanted there to be no confusion what version 2.5 was.

The versions that a service supports are discoverable by hitting the root document of the API service. The other important note is that services are expected to continue to support old versions for a very long time. In Nova today we're up to about 2.53, and we still support everything back down to 2.0. That represents about 2.5 years of API changes.

There are a lot more details on the justification here for the approach, but not enough time today to go into them. If you want to learn more, I've got a blog writeup from when we first did that that dives in pretty deep, including showing the personas you'd expect to interact with this system.

REST_API_MV-13

Thus far this has worked pretty well. About 1/2 the base services in OpenStack have implemented a version of this. Most are pretty happy with the results. That version document I talked about can be seen here.

There are open questions for the future, mostly around raising minimum versions. No one has done it yet, though there are some thoughts about how you do that.

REST_API_MV-14

Since I'm here, and there are a lot of OpenAPI experts around, I wanted to talk just briefly about OpenStack and OpenAPI.

REST_API_MV-15

The OpenStack APIs date back to about 2010, when the state of the art for doing REST APIs was WADL. A now long dead proposed specification by Sun Microsystems. Lots of XML. But, that was the constraints at the time, which are different constraints than OpenAPI.

One of those issues is our actions API, where we use the same url with very different payloads to do non-RESTy function calls. Like reboot a server. The other is Microversions, which don't have any real way to make to OpenAPI without using vendor extensions, at which point you loose most of the interesting tooling in the ecosystem.

There is an open question in my mind about whether the microversion approach is interesting enough that it's something we could consider for OpenAPI. OpenStack could easily microversion itself out of the actions API to something more OpenAPI friendly, but without microversion support there isn't much point.

REST_API_MV-16

There was a talk yesterday about "Never have a breaking API change again", which followed about 80% of our story, but didn't need something like microversions because it was Azure, and they controlled when code got deployed to users.

There are very specific challenges for Open Source projects that expose a public web services API, and expect to be deployed directly by cloud providers. We're all used to open source behind the scenes, and plumbing to our services. But Open Source is growing into more areas. It is our services now. With things like OpenStack, Kubernetes, OpenWhisk... Open Source projects now are defining that user consumable API. If we don't come up with common patterns for how to handle it then we're putting Open Source at a disadvantage.

I've been involved in Open Source for close to two decades, and I strongly believe we should make Open Source able to play on the same playing field as proprietary services. The only way we can do this is think about if our tools and standards support Open Source all the way out to the edge.

Questions

Question 1: How long do you expect to have to keep the old code around, and how bad it is to manage that?

Answer: Definitely a long time, minimum a few years. The implementation of all of this is in python and we've got a bunch of pretty good decorators and documentation that makes it pretty easy to compartmentalize the code. No one has yet lifted a minimum version, as the amount of work to support the old code hasn't really been burdensome as of yet. We'll see how that changes in the future.

Question 2: Does GraphQL solve this problem in a way that you wouldn't need microversions?

Answer: That's a very interesting question. When we got started GraphQL was pretty nascent so not really on our radar. I've spent some time looking at GraphQL recently, and I think the answer is "yes, in theory, no in practice", and this is why.

Our experience with the OpenStack API over the last 7 years is no one consumes your API directly. They almost always use some 3rd party SDK in their programming language of choice that gives them nice language bindings, and feels like their language of choice. GraphQL is great when you are going to think about your interaction with the service in a really low level way, and ask only for the minimal data you need to do your job. But these SDK writers don't know what you need, so when they build their object models, they just do so by putting everything in. At which point you are pretty much back to where we started.

I think GraphQL is going to work out really well for really popular services (like github) where people are willing to take the hit to go super low level. Or where you know the backend details well enough to understand the cost differential of asking for different attributes. But I don't think it obviates trying to come up with a server side versioning for APIs in Open Source.

 

 

Notes from North Bay Python

North Bay Python marqee at the Mystic Theatre in Petaluma, CA

I had the pleasure of attending the first North Bay Python conference in Petaluma, CA this past weekend. IBM was a sponsor, and I gave a few quick remarks about doing python serverless actions on OpenWhisk. My two days there were full of wonderful talks and interactions with a slice of the python community.

One of the reasons I love low cost (to attendee) regional conferences like North Bay Python is that it makes technology conferences more accessible. For 40% of the 250 attendees, this was the first technology conference they'd ever gone to. Not everyone lives in New York City or San Francisco (or wants to), and having local events is critical to expanding the range of voices in the technology community.

There were tons of great talks, you can watch them all here. But I'll highlight a few moments that I'll keep with me for a while.

Fortran on stage

For a single track Python conference, we actually got to see FORTRAN in 2 different talks. It's probably more FORTRAN than I've ever read before.

Catherine Moroney is part of the team that does analysis of satellite images from the LandSat program. They've got a lot of optimized FORTRAN and C code for processing these images. But FORTAN and C aren't great languages for writing new features to orchestrate these lower level transforms. She uses Python to do this work, and can seamlessly pass data back and forth from Python to FORTRAN for data crunching. It was great to see how and when a hybrid approach like this makes the developers much more effective.

Christopher Swenson tackled FORTRAN from the other side. He hacked together a FORTAN IV interpretter in Python, so that he could run Colossal Cave Adventure (originally written for the PDP-11) as a text message game using the Twilio API. His talk wandered through some of the interesting quirks of now extinct programming languages, and the systems they were written for. This is a world before ASCII as we know it became a standard, and the idea of 32bit integers really hadn't emerged. 36bit integers were used to store 5, 7bit characters, which were later assembled into text streams.

Through the whole thing he showed snippets of FORTRAN that he had to make guesses about what it really meant, as well as be really frank on shortcuts he made to get things to work. There is no more FORTRAN IV code in the world, this didn't have to be a perfect emulator, it just had to run this one single FORTRAN IV program well enough to connect it to the internet.

You can play this by texting to +1 (669) 238-3683 right now if you want to see it in action.

Twitter Bots

Tweet: "What is Machine Learning? Easy! Machine Learning is how you automate your biases so you can think about them even less."

My vote for most hilarious talk was Benno Rice's dive into writing twitter bots. He started with pretty easy template base bots, like one producing plausible plot lines for Mid Summer Murders. This is just a stack of well crafted arrays and a random number generator.

Things got more interesting when he started diving into Markov Chain bots. Especially where you take content from a bunch of different sources. It's really easy for that to just become word salad at worst, or just confusing and "meh". He found you had to keep playing with the content mix as well as the overlap parameters to get the bots to generate something that's amusing at least some of the time. The bots he's got he doesn't let post directly, content is generated offline, and he pushes the good ones after manual review.

Benno also took a dive down the path trying to do machine learning to make these better, but mostly got worse results in his experiments. But, the story of that not working out was funny all by itself. The real lesson here is that playfulness is useful in learning some new things, and that Twitter bots are a lot of fun to build.

Search First Documentation

My vote for most takeaways that I'll personally use goes to Heidi Waterhouse for "Search-First Writing for Developers". Recently there as a mass migration of OpenStack Documentation from a dedicated docs team to all the development teams.

The heart of her message is that to any first approximation, no one reads your documentation. Users end up at your documentation when they have a problem with your software, so they are showing up a) grumpy, and b) through whatever Google terms they could guess for their problem. They are not coming through your carefully curated table of contents, they are coming from Google, and then they are skimming to find their answer. The won't follow links to other pages, this is where they are.

What that means is you need to treat every page as the only page that the user will ever see, you need to optimize your content for skimming, and you need to answer problems people actually have, not the ones you think they might have. Getting real analytics on how folks are reading your docs, and the search terms they are coming in with, is an important part of this.

Hearing all these harsh and practical words from someone that spent 15 years as a technical content author was really enlightening. I'll definitely have to watch this talk again and digest more of Heidi's lessons.

Safe Spaces

Reporting guidelines for Safety Incidents

One of the welcome trends that I've seen at tech conferences over the last 5 years is a real focus on a strict Code of Conduct, clear reporting guidelines, and making sure that folks feel safe at these events. North Bay Python did a great job on that front, and that commitment definitely was reflected in a pretty diverse speaker lineup and attendee base.

The effort they went to was highlighted further by Seán Hanson's talk on Quiet Developers. We've long known that while diversity in Tech is much lower than national averages, it's ever worse in Open Source Software. A big reason for this is members of traditionally marginalized communities really don't feel safe in these environments, or may not have the spare time to devote outside of their normal day jobs. It doesn't mean they aren't great developers, it's just that current systems are optimized for loudness as much as talent. Seán's whole talk was ways to engage and get the most out of your quiet developers, and give them what they need to really succeed. While I did need to leave about the time this talk started, I stuck around and watched from the balcony. His message was really powerful and really important to how we all evolve the tech community going forward.

Double A Plus, Would Come Again

North Bay Python was definitely worth the trip. It had a few normal quirks of a first time conference on scheduling. Being Petaluma, the Theatre didn't actually have heat, so the first few hours the first day were a bit cold in there. But it warmed up pretty quickly with 250 bodies. The biggest issue in my mind was there wasn't much common space outside of the theatre, so a hallway track wasn't really a thing. It would have been nice to have a bit more milling about time to get to know folks there, and ask follow up questions of speakers.

But all in all a great time. Looking forward to seeing how they do next year.

 

Notes from API Strat

Back in November I had the pleasure to attend API Strat for the first time. It was 2 days of short (20 minute) sessions running in 3 tracks with people discussing web service API design, practice, and related topics. My interest was to get wider exposure to the API Microversions work that we did in OpenStack, and get out of that bubble to see what else was going on in the space.

Events on the Web

Event technologies being used by different web services
Event technologies being used by different web services

There were lots of talks that brought up the problem of getting real time events back to clients. Clients talking to servers is a pretty solved problem with RESTful interfaces. But the other way is far from a solved item. The 5 leading contenders are Webhooks (over http), HTTP long polling, Web Sockets, AMQP, and MQTT. Each has their boosters, and their place, but this is going to be a messy space for the next few years.

OpenAPI's version 3 specification includes webhooks, though with version 3 there is no simultaneously launched tooling. It will take some time before people build implementations around that. That's a boost in that direction. Nginx is adding MQTT proxy support. That's a boost in that direction.

Webhooks vs. Serverless

Speaking of webhooks, the keynote from Glenn Block of Auth0 brought up an interesting point: serverless effectively lives in the eventing space as well.

Webhooks are all fine and good to make your platform efficient and scalable. If clients now have to run their own redundant highly available services to catch events, that's a lot of work, and many will just skip it. The found that once they build out a serverless platform where they could host their clients code, they got much more uptake on their event API. And, more importantly, they found that their power user customers were actually building out important features of their platform. He made a good case that every online service should really be considering an embedded serverless environment.

API Microversions

I was ostensibly there to talk about API Microversions, an approach we did in OpenStack to handle the fact that deployments of OpenStack upgrade at very different cadences. The talk went pretty well.

20 minutes was a challenge to explain something that took us all 6 months to get our heads around. I do think I managed to communicate the key challenge: when you build an open source system with a user facing API, how do users control what they get?  A lot of previous "good enough" rules fall down.

Darrel Miller had given a talk "How to never make another breaking API change". His first 5 minutes were really similar to mine, and then, because this was about Azure, with a single controlled API instance, the solution veered in a different direction. It was solid reinforcement for that fact that we were on the right path here, and that the open source solution has a different additional constraint.

One of the key questions I got in Q&A is one I'd been thinking about. Does GraphQL make this all obsolete? GraphQL was invented by Facebook to get away from the HTTP GET/POST model of passing data around, and let you specify a pretty structured query about the data you need from the server. On paper, it solves a similar problem as microversions, because it if you are really careful with your GraphQL you can ask for the minimum data you need, and are unlikely to get caught by things coming and going in the API. However, in practice, I'm not convinced it would work. In OpenStack we saw that most API usage was not raw API calls, it was through an SDK provided by someone in the community. If you are an SDK writer, it's a lot harder to make assumptions about what parts of objects people want, so you'd tend to return everything. And that puts you right back with the same problem we have in REST in OpenStack.

API Documentation

There were many talks on better approaches for documentation, which resonated with my after the great OpenStack docs migration.

Taylor Barnett's talk "Things I Wish People Told Me About Writing Docs" was one of my favorites. It included real user studies on what people actually read in documentation. It turns out that people don't read your API documentation, they skim hard. They will read your code snippets as long as they aren't too long. But they won't read the paragraph before it, so if there is something really critical about the code, make it a comment in the code snippet itself. There was also a great cautionary tale to stop using phases like "can be easily done". People furiously hunting around your site trying to get code working are ramping up quick. Words like "easy" make them feel dumb and frustrated when they don't get it working on the first try. Having a little more empathy for the state of mind of the user when they show up goes a long way towards building a relationship with them, and making them more bought into your platform.

Making New Friends

I also managed to have an incredible dinner the first night I was in town setup by my friend Chris Aedo. Both the food and conversation were amazing, in which I learned about Wordnic, distributed data systems, and that you can loose a year of research because ferrits bread for specific traits might be too dumb to be trained.

Definitely a lovely conference, and one I hope to make it back to next year.

Getting Chevy Bolt Charge Data with Python

Filed under: kind of insane code, be careful about doing this at home.

Recently we went electric, and got a Chevy Bolt to replace our 12 year old Toyota Prius (who has and continues to be a workhorse). I had a spot in line for a Tesla Model 3, but due to many factors, we decided to go test drive and ultimately purchase the Bolt. It's a week in and so far so good.

One of the things GM does far worse than Tesla, is make its data available to owners. There is quite a lot of telemetry captured by the Bolt, through OnStar, which you can see by logging into their website or app. But, no API (or at least no clear path to get access to the API).

However, it's the 21st century. That means we can do ridiculous things with software, like use python to start a full web browser, log into their web application, and scrape out data..... so I did that.

The Code

This uses selenium, which is a tool used to test websites automatically. To get started you have to install selenium python drivers, as well as the chrome web driver. I'll leave those as an exercise to the reader.

After that, the process looks a little like one might expect. Start with the login screen, find the fields for user/password, send_keys (which literally acts like typing), and submit.

The My Chevrolet site is an Angular JS site, which seems to have no stateful caching of the telemetry data for the car. Instead, once you log in you are presented with an overview of your car, and it makes an async call through the OnStar network back to your car to get its data. That includes charge level, charge state, estimated range. The OnStar network is a CDMA network, proprietary protocol, and ends up taking at least 60 seconds to return that call.

This means that you can't just pull data out of the page once you've logged in, because the data isn't there, there is a spinner instead. Selenium provides you a WebDriverWait class for that, which will wait until an element shows up in the DOM. We can just wait for the status-box to arrive. Then dump its text.

The output from this script looks like this:

Which was enough for what I was hoping to return.

The Future

Honestly, I really didn't want to write any of this code. I really would rather get access to the GM API and do this the right way. Ideally I'd really like to make the Chevy Bolt in Home Assistant as easy as using a Tesla. With chrome inspector, I can see that the inner call is actually returning a very nice json structure back to the angular app. I've sent an email to the GM developer program to try to get real access, thus far, black hole.

Lots of Caveats on this code. That OnStar link and the My Chevrolet site are sometimes flakey, don't know why, so running something like this on a busy loop probably is not a thing you want to do. For about 2 hours last night I just got "there is no OnStar account associated with this vehicle", which then magically went away. I'd honestly probably not run it more than hourly. I made no claims about the integrity of things like this.

Once you see the thing working, it can be run headless by uncommenting line 18. Then it could be run on any Linux system, even one without graphics.

Again, this is one of the more rediculous pieces of code I've ever written. It is definitely a "currently seems to work for me" state, and don't expect it be robust. I make no claims about whether or not it might damage anything in the process, though if logging into a website damages your car, GM has bigger issues.