Tag Archives: code

Syncing Sieve Rules in Fastmail, the hard way

I’ve been hosting my email over at Fastmail for years, and for the most part the service is great. The company understands privacy, contributes back to open source, and is incredibly reliable. One of the main reasons I moved off of gmail was their mail filtering system was not fine grained enough to deal with my email stream (especially open source project emails). Fastmail supports sieve, which lets you write quite complex filtering rules. There was only one problem, syncing those rules.

My sieve rules are currently just north of 700 lines. Anything that complex is something that I like to manage in git, so that if I mess something up, it’s easy to revert to known good state.

No API for Sieve

Fastmail does not support any kind of API for syncing Sieve rules. There is an official standard for this, called MANAGESIEVE, but the technology stack Fastmail uses doesn’t support it. I’ve filed tickets over the years that mostly got filed away as future features.

When I first joined Fastmail, their website was entirely classic html forms. Being no slouch, I had a python mechanize script that would log in as me, then navigate to the upload form, and submit it. This worked well for years. I had a workflow where I’d make a sieve change, sync via script, see that it generated no errors, then commit. I have 77 commits to my sieve rules repository going back to 2013.

But, a couple of years ago the Fastmail team refreshed their user interface to a Javascript based UI (called Overture). It’s a much nicer UI, but it means it only works with a javascript enabled browser. Getting to the form box where I can upload my sieve rules is about 6 clicks. I stopped really tweaking the rules regularly because of the friction of updating them through clear / copy / paste.

Using Selenium for unintended purposes

Selenium is pretty amazing web test tool. It gives you an API to drive a web browser remotely. With recent versions of Chrome, there is even a headless chrome driver, so you can do this without popping up a graphics window. You can drive this all from python (or your language of choice).

An off hand comment by Nibz about using Selenium for something no one intended got me thinking: could I manage to get this to do my synchronization?

Answer, yes. Also, this is one of the goofiest bits of code that I’ve ever written.

#!/usr/bin/env python3

import configparser
import os
import sys

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

config = configparser.ConfigParser()
config.read("config.ini")

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=os.path.abspath("/usr/local/bin/chromedriver"),
                          chrome_options=chrome_options)

driver.get("https://fastmail.fm")

timeout = 120
try:
    element_present = EC.presence_of_element_located((By.NAME, 'username'))
    WebDriverWait(driver, timeout).until(element_present)

    # Send login information

    user = driver.find_element_by_name("username")
    passwd = driver.find_element_by_name("password")
    user.send_keys(config["default"]["user"])
    passwd.send_keys(config["default"]["pass"])
    driver.find_element_by_class_name("v-Button").click()

    print("Logged in")

    # wait for login to complete
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-MainNavToolbar'))
    WebDriverWait(driver, timeout).until(element_present)

    # click settings menu to make elements visible
    driver.find_element_by_class_name("v-MainNavToolbar").click()

    # And follow to settings page
    driver.find_element_by_link_text("Settings").click()

    # Wait for settings page to render, oh Javascript
    element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Rules'))
    WebDriverWait(driver, timeout).until(element_present)

    # Click on Rules link
    driver.find_element_by_link_text("Rules").click()

    # Click on edit custom sieve code
    element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Edit custom sieve code'))
    WebDriverWait(driver, timeout).until(element_present)
    driver.find_element_by_link_text("Edit custom sieve code").click()

    print("Editing")

    # This is super unstable, I hate that we have to go by webid
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-EditSieve-rules'))
    WebDriverWait(driver, timeout).until(element_present)

    print("Find form")
    elements = driver.find_elements_by_css_selector("textarea.v-Text-input")
    element = elements[-1]

    # Find the submit button
    elements = driver.find_elements_by_css_selector("button")
    for e in elements:
        if "Save" in e.text:
            submit = e

    print("Found form")
    # And replace the contents
    element.clear()

    with open("rules.txt") as f:
        element.send_keys(f.read())

    # This is the Save button
    print("Submitted!")
    submit.click()

except TimeoutException as e:
    print(e)
    print("Timed out waiting for page to load")
    sys.exit(0)

print("Done!")

Basic Flow

I won’t do a line by line explanation, but there are a few concepts that make the whole thing fall in line.

The first is the use of WebDriverWait. This is an OvertureJS application, which means that clicking parts of the screen trigger an ajax interaction, and it may be some time before the screen “repaints”. This could be a new page, a change to the existing page, an element becoming visible. Find a thing, click a thing, wait for the next thing. There is a 5 click interaction before I get to the sieve edit form, then a save button click to finish it off.

Finding things is important, and sometimes hard. Being an OvertureJS application, div ids are pretty much useless. So I stared a lot in Chrome inspector at what looked like stable classes to find the right things to click on. All of those could change with new versions of the UI, so this is fragile at best. Some times you just have to count, like finding the last textarea on the Rules page. Some times you have to inspect elements, like looking through all the buttons on a page to find the one that says “Save”.

Filling out forms is done with sendKeys, which approximates typing by sending 1 character every few milliseconds. If you run non headless it makes for amusing animation. My sieve file is close to 20,000 characters, so this takes more than a full minute to put that content in one character at a time. But at least it’s a machine, so no typos.

The Good and the Bad

The good thing is this all seems to work, pretty reliably. I’ve been running it for the last week and all my changes are getting saved correctly.

The bad things are you can’t have 2 factor enabled and use this, because unlike things like IMAP where you can provision an App password for Fastmail, this is really logging in and pretending to be you clicking through the website and typing. There are no limited users for that.

It’s also slow. A full run takes

It’s definitely fragile, I’m sure an update to their site is going to break it. And then I’ll be in Chrome inspector again to figure out how to make this work.

But, on the upside, this let me learn a more general purpose set of tools for crawling and automating the modern web (which requires javascript). I’ve used this technique for a few sites now, and it’s a good technique to add to your bag of tricks.

The Future

Right now this script is in the same repo as my rules. This also requires setting up the selenium environment and headless chrome, which I’ve not really documented. I will take some time to split this out on github so others could use it.

I would love it if Fastmail would support MANAGESIEVE, or have an HTTP API to fetch / store sieve rules. Anything where I could use a limited app user instead of my full user. I really want to delete this code and never speak of it again, but a couple of years and closed support tickets later, and this is the best I’ve got.

If you know someone in Fastmail engineering and can ask them about having a supported path to programatically update sieve rules, that would be wonderful. I know a number of software developers that have considered the switch to Fastmail, but stopped when the discovered that updating sieve can only be done in the webui.

Updated (12/15/2017): via Twitter the Fastmail team corrected me that it’s not Angular, but their own JS toolkit called OvertureJS. The article has been corrected to reflect that.

 

Reading Code

When you are learning how to program, you think that most of your time is going to be spent writing code. The reality is most of your coding time is actually spent reading code. Other people’s code. Code with comments that lie. Code with bizarre short cuts. Code whose original authors are long gone or unresponsive.

And that’s the real skill of a good programmer, the ability to read this kind of code, and make sense of it. Maybe even make it a little better as you come across it.

Relearned Linear Algebra

After nearly a month of tinkering with code, nearly giving up twice, and realizing that I was going to actually need to relearn my linear algebra to get a real solution, I managed to create this graph.  It is the position of the moons of Jupiter relative to the planet as seen from earth.

Thanks to Thor for helping me get to the realization that straight up geometry wasn’t going to be good enough, and help boot strap my relearning of vector math.  Once I started using real linear algebra I didn’t even have to cheat on generating the sign.  Next step… JNI.

Catching bad links with jquery

We’re 1 step closer to the launch of the new Poughkeepsie Farm Project website, so it’s down to some final edits before it gets flipped live.  While I was looking over the test site the other day, I realized we still had some links, and images that referred to the existing site, which would break once we did the final domain switcheroo.

I came up with the following snippet of jquery to highlight bad links and images client side so that editors would realize they needed to do something about them:

function highlight_bad() {
    $("div[id='content'] img[src^='http://farmproject.org']").css("border","9px solid red");
    $("div[id='content'] img[src^='http://www.farmproject.org']").css("border","9px solid red");
    $("div[id='content'] img[src^='http://test.farmproject.org']").css("border","9px solid red");
    $("div[id='content'] img[src^='http://pfp.dague.org']").css("border","9px solid red");
    $("div[id='content'] img[src^='http://farm.dague.org']").css("border","9px solid red");
    $("div[id='content'] a[href^='http://farmproject.org']").css("border","9px solid red");
    $("div[id='content'] a[href^='http://www.farmproject.org']").css("border","9px solid red");
    $("div[id='content'] a[href^='http://test.farmproject.org']").css("border","9px solid red");
    $("div[id='content'] a[href^='http://pfp.dague.org']").css("border","9px solid red");
    $("div[id='content'] a[href^='http://farm.dague.org']").css("border","9px solid red");
}

So every time we find a link or image that starts with an absolute url to one of the addresses the site has had inside the content block, we highlight it. This has been incredibly effective so far in catching some things I didn’t even realize was an issue.  This with the combo of drupal’s broken link detector internally is helping us ensure the content is consistent prior to launch.

Ruby Snippet – Tagging mp3 files

For the npr shows that don’t podcast, I use icecream to save them off for my own time shifting. The files end up with names like “car_talk_2008_01_17.mp3”. Until recently, that was good enough, but the new Sandisk players that both my wife and I have only function on tags, not on filenames. Last night I wrote this small ruby script to fix that:

#!/usr/bin/ruby

require "date"
require "rubygems"
require "mp3info"

ARGV.each do |file|
  title, datestr = file.scan(/(w+)_(d+_d+_d+).mp3/)[0]
  if title and datestr
    date = DateTime.parse(datestr.gsub!(/_/,"-"))
    title.gsub!(/_/, " ")
    title = title.split.map {|a| a.capitalize}.join(" ")
    puts date
    puts title
    Mp3Info.open(file) do |mp3|
      if not mp3.tag.album == "#{title} #{date.strftime("%Y")}"
        mp3.tag.album = "#{title} #{date.strftime("%Y")}"
        mp3.tag.artist = "WAMC Recordings"
        mp3.tag.title = date.strftime("%Y %m %d - #{title}")
      end
    end
  end
end

What’s going on should be pretty clear, but I’ll highlight a few things.  First we are iterating over ARGV, so this takes a list of files on the command line.  DateTime has a parser, which is actually pretty good.  Anything that looks like a standard date can be converted back to one.

Ruby methods always return object instances, which let you do things like

title.split.map {|a| a.capitalize}.join(" ")

where you split on white space, capitalize the components in the array, and join it back into a string.

And to wrap it all up, we’ve got a great Mp3Info library as a gem. Wondering where the save call is? Well that’s one of the wonderful things about ruby do blocks, the save is implicit when we end the block as mp3 goes out of scope. No need to make sure you clean up those resources or sync manually, because by doing the action in the do block all the setup / teardown is handled by the system. I used to be confused about do blocks, now I love them for this very reason.