Category Archives: Software

Syncing Sieve Rules in Fastmail, the hard way

I've been hosting my email over at Fastmail for years, and for the most part the service is great. The company understands privacy, contributes back to open source, and is incredibly reliable. One of the main reasons I moved off of gmail was their mail filtering system was not fine grained enough to deal with my email stream (especially open source project emails). Fastmail supports sieve, which lets you write quite complex filtering rules. There was only one problem, syncing those rules.

My sieve rules are currently just north of 700 lines. Anything that complex is something that I like to manage in git, so that if I mess something up, it's easy to revert to known good state.

No API for Sieve

Fastmail does not support any kind of API for syncing Sieve rules. There is an official standard for this, called MANAGESIEVE, but the technology stack Fastmail uses doesn't support it. I've filed tickets over the years that mostly got filed away as future features.

When I first joined Fastmail, their website was entirely classic html forms. Being no slouch, I had a python mechanize script that would log in as me, then navigate to the upload form, and submit it. This worked well for years. I had a workflow where I'd make a sieve change, sync via script, see that it generated no errors, then commit. I have 77 commits to my sieve rules repository going back to 2013.

But, a couple of years ago the Fastmail team refreshed their user interface to a Javascript based UI (called Overture). It's a much nicer UI, but it means it only works with a javascript enabled browser. Getting to the form box where I can upload my sieve rules is about 6 clicks. I stopped really tweaking the rules regularly because of the friction of updating them through clear / copy / paste.

Using Selenium for unintended purposes

Selenium is pretty amazing web test tool. It gives you an API to drive a web browser remotely. With recent versions of Chrome, there is even a headless chrome driver, so you can do this without popping up a graphics window. You can drive this all from python (or your language of choice).

An off hand comment by Nibz about using Selenium for something no one intended got me thinking: could I manage to get this to do my synchronization?

Answer, yes. Also, this is one of the goofiest bits of code that I've ever written.

Basic Flow

I won't do a line by line explanation, but there are a few concepts that make the whole thing fall in line.

The first is the use of WebDriverWait. This is an OvertureJS application, which means that clicking parts of the screen trigger an ajax interaction, and it may be some time before the screen "repaints". This could be a new page, a change to the existing page, an element becoming visible. Find a thing, click a thing, wait for the next thing. There is a 5 click interaction before I get to the sieve edit form, then a save button click to finish it off.

Finding things is important, and sometimes hard. Being an OvertureJS application, div ids are pretty much useless. So I stared a lot in Chrome inspector at what looked like stable classes to find the right things to click on. All of those could change with new versions of the UI, so this is fragile at best. Some times you just have to count, like finding the last textarea on the Rules page. Some times you have to inspect elements, like looking through all the buttons on a page to find the one that says "Save".

Filling out forms is done with sendKeys, which approximates typing by sending 1 character every few milliseconds. If you run non headless it makes for amusing animation. My sieve file is close to 20,000 characters, so this takes more than a full minute to put that content in one character at a time. But at least it's a machine, so no typos.

The Good and the Bad

The good thing is this all seems to work, pretty reliably. I've been running it for the last week and all my changes are getting saved correctly.

The bad things are you can't have 2 factor enabled and use this, because unlike things like IMAP where you can provision an App password for Fastmail, this is really logging in and pretending to be you clicking through the website and typing. There are no limited users for that.

It's also slow. A full run takes

It's definitely fragile, I'm sure an update to their site is going to break it. And then I'll be in Chrome inspector again to figure out how to make this work.

But, on the upside, this let me learn a more general purpose set of tools for crawling and automating the modern web (which requires javascript). I've used this technique for a few sites now, and it's a good technique to add to your bag of tricks.

The Future

Right now this script is in the same repo as my rules. This also requires setting up the selenium environment and headless chrome, which I've not really documented. I will take some time to split this out on github so others could use it.

I would love it if Fastmail would support MANAGESIEVE, or have an HTTP API to fetch / store sieve rules. Anything where I could use a limited app user instead of my full user. I really want to delete this code and never speak of it again, but a couple of years and closed support tickets later, and this is the best I've got.

If you know someone in Fastmail engineering and can ask them about having a supported path to programatically update sieve rules, that would be wonderful. I know a number of software developers that have considered the switch to Fastmail, but stopped when the discovered that updating sieve can only be done in the webui.

Updated (12/15/2017): via Twitter the Fastmail team corrected me that it's not Angular, but their own JS toolkit called OvertureJS. The article has been corrected to reflect that.

 

Getting Chevy Bolt Charge Data with Python

Filed under: kind of insane code, be careful about doing this at home.

Recently we went electric, and got a Chevy Bolt to replace our 12 year old Toyota Prius (who has and continues to be a workhorse). I had a spot in line for a Tesla Model 3, but due to many factors, we decided to go test drive and ultimately purchase the Bolt. It's a week in and so far so good.

One of the things GM does far worse than Tesla, is make its data available to owners. There is quite a lot of telemetry captured by the Bolt, through OnStar, which you can see by logging into their website or app. But, no API (or at least no clear path to get access to the API).

However, it's the 21st century. That means we can do ridiculous things with software, like use python to start a full web browser, log into their web application, and scrape out data..... so I did that.

The Code

This uses selenium, which is a tool used to test websites automatically. To get started you have to install selenium python drivers, as well as the chrome web driver. I'll leave those as an exercise to the reader.

After that, the process looks a little like one might expect. Start with the login screen, find the fields for user/password, send_keys (which literally acts like typing), and submit.

The My Chevrolet site is an Angular JS site, which seems to have no stateful caching of the telemetry data for the car. Instead, once you log in you are presented with an overview of your car, and it makes an async call through the OnStar network back to your car to get its data. That includes charge level, charge state, estimated range. The OnStar network is a CDMA network, proprietary protocol, and ends up taking at least 60 seconds to return that call.

This means that you can't just pull data out of the page once you've logged in, because the data isn't there, there is a spinner instead. Selenium provides you a WebDriverWait class for that, which will wait until an element shows up in the DOM. We can just wait for the status-box to arrive. Then dump its text.

The output from this script looks like this:

Which was enough for what I was hoping to return.

The Future

Honestly, I really didn't want to write any of this code. I really would rather get access to the GM API and do this the right way. Ideally I'd really like to make the Chevy Bolt in Home Assistant as easy as using a Tesla. With chrome inspector, I can see that the inner call is actually returning a very nice json structure back to the angular app. I've sent an email to the GM developer program to try to get real access, thus far, black hole.

Lots of Caveats on this code. That OnStar link and the My Chevrolet site are sometimes flakey, don't know why, so running something like this on a busy loop probably is not a thing you want to do. For about 2 hours last night I just got "there is no OnStar account associated with this vehicle", which then magically went away. I'd honestly probably not run it more than hourly. I made no claims about the integrity of things like this.

Once you see the thing working, it can be run headless by uncommenting line 18. Then it could be run on any Linux system, even one without graphics.

Again, this is one of the more rediculous pieces of code I've ever written. It is definitely a "currently seems to work for me" state, and don't expect it be robust. I make no claims about whether or not it might damage anything in the process, though if logging into a website damages your car, GM has bigger issues.

 

Triple Bottom Line in Open Source

One of the more thought provoking things that came out of the OpenStack leadership training at Zingerman's last year, was the idea of the Triple Bottom Line. It's something I continue to ponder regularly.

The Zingerman's family of businesses definitely exist to make money, there are no apologies for that. However, it's not their only bottom line that they measure against they've defined for themselves. Their full bottom line is "Great Food, Great Service, Great Finance." In practice this means you have to ensure that all are being met, and not sacrifice the food and service just to make a buck.

If you look at Open Source through this kind of lens, a lot of trade offs that successful projects make make a lot more sense. The TBL for OpenStack would probably be something like: Code, Community, Contributors. Yes, this is about building great code, to make a great cloud, but it's also really critical to grow the community, and mentor and grow individual contributors as well. Those contributors might stay in OpenStack, or they might go on to use their skills to help other Open Source projects be better in the future. All of these are measures of success.

This was one of the reasons we recently switch the development tooling in OpenStack (DevStack) to using systemd more natively. Not only did it solve a bunch of long standing technical issues, that had really ugly work arounds, but it also meant enhancing our contributors. Systemd and the journal are default in every new Linux environment now, so skills that our contributors gained working with DevStack would now directly transfer to any Linux environment. It would make them better Linux users in any context, not just OpenStack. It also makes the environment easier for people coming from the outside to understand, because it looks more like what they are used to.

While I don't have enough data to back it up, it feels like this central question is really important to success in Open Source: "In order to be successful in this project you must learn X, which will be useful in these other contexts outside of the project." X has to be small enough to be learnable, but also has to be useful in other contexts, so time invested has larger payoffs. That's what growing a contributor looks like, they don't just become better at your project, they become a better developer for everything they touch in the future.

IoT & Home Assistant at OpenWest

I'm thrilled to be talking about the Internet of Things and Home Assistant at the OpenWest conference next week. The talk for it has come together quite nicely, and I'll hopefully be giving it a few more places over the coming year as well. The goal of the talk is to explain some of the complexity of the space, and see why it is so complex, and why the only real path forward in the short / medium term is an open source hub at the heart of everything.

For those that can't make it all the way to Utah, there is a trimmed down Article version of it up at opensource.com. The article seems to be doing well, and was #2 for this week on the site.

I will also be forever indebted to Benjamin Walker and his complete throw away line "this is why we can't have the internet of nice things" during his New York After Rent series (which is really incredible, and completely unrelated to any of this), which stuck in my brain for months afterwards, and became the seed of inspiration for this talk.

Hacking Windmills

Staggs sat in the front seat and opened a MacBook Pro while the researchers looked up at the towering machine. Like the dozens of other turbines in the field, its white blades—each longer than a wing of a Boeing 747—turned hypnotically. Staggs typed into his laptop's command line and soon saw a list of IP addresses representing every networked turbine in the field. A few minutes later he typed another command, and the hackers watched as the single turbine above them emitted a muted screech like the brakes of an aging 18-wheel truck, slowed, and came to a stop.

Source: Researchers Found They Could Hack Entire Wind Farms | WIRED

In a networked world, you need cyber security everywhere. Especially when physical access is so easy to get. The BeyondCorp model of not trusting the network is a really good starting place for systems like this.

Visualizing Watson Speech Transcripts

After comparing various speech to text engines, and staring at transcripts, I got intrigued about how much more metadata I was getting back from Watson about the speech. With both timings and confidence levels I built a little visualizer for the transcript that colors things based on confidence, and attempts to insert some punctuation:

This is a talk by Neil Gaiman about how stories last at the Long Now Foundation.

Things are more red -> yellow based on how uncertain they are.

A few things I learned along the way with this. Reversing punctuation into transcriptions of speech is hard. Originally I was trying to figure out if there was some speech delay that I could guess for a comma vs. a period, and very quickly that just turned into mush. The rule I came up with which wasn't terrible is to put a comma in for 0.1 - 0.3s delays, and put one period of an elipsis in for every 0.1s delay in speech for longer pauses. That gives a sense of the dramatic pauses, and does mentally make it easier to read along.

It definitely shows how the metadata around speech to text can make human understanding of the content a lot easier. It's nice that you can get that out of Watson, and it would be great if more environments supported that.

 

 

 

 

 

Comparing Speech Recognition for Transcripts

I listen to a lot of podcasts. Often months later something about one I listened to really strikes a chord, enough that I want to share it with others through Facebook or my blog. I'd like to quote the relevant section, but also link to about where it was in the audio.

Listening back through one or more hours of podcast just to find the right 60 seconds and transcribe them is enough extra work that I often just don't share. But now that I've got access to the Watson Speech to Text service I decided to try to find out how effectively I could use software to solve this. And, just to get a sense of the world, compare the Watson engine with Google and CMU Sphinx.

Input Data

The input in question was a lecture from the Commonwealth Club of California - Zip Code, not Genetic Code: The California Endowment's 10 year, $1 Billion Initiative. There was a really interesting bit in there about spending and outcome comparisons between different countries that I wanted to quote. The Commonwealth Club makes all these files available as mp3, which none of the speech engines handle. Watson and Google both can do FLAC, and Sphinx needs a wav file. Also it appears that all speech models are trained around the assumption of a 16kHz sampling, so I needed to down sample the mp3 file and convert it. Fortunately, ffmpeg to the rescue.

Watson

The Watson Speech to Text API can either work over websocket streaming or with bulk HTTP. While I had some python code to use the websocket streaming for live transcription, I was consistently getting SSL errors after 30 - 90 seconds. A bit of googling hints that this might actually be bugs on the python side. So I reverted back to the bulk HTTP upload interface using example code from the watson-developer-cloud python package. This script I used to do it is up on github.

The first 1000 minutes of transcription are free, so this is something you could reasonably do pretty regularly. After that it is$0.02 / minute for translation.

When doing this over the bulk interface things are just going to seem to have "hung" for about 30 minutes, but it will eventually return data. Watson seems like it's operating no faster than 2x real time for processing audio data. The bulk processing time surprised me, but then I realized that with the general focus on real time processing most speech recognition systems just need to be faster than real time, and optimizing past that has very diminishing returns, especially if there is an accuracy trade off in the process.

The returned raw data is highly verbose, and has the advantages of having timestamps per word, which makes finding passages in the audio really convenient.

So 30 minutes in I had my answer.

Google

I was curious to also see what the Google experience was like, which I originally did through their API console quite nicely. Google is clearly more focused on short bits of audio. There are 3 interfaces: sync, async, and streaming. Only async allows for greater than 60 seconds of audio.

In the async model you have to upload your content to Google Storage first, then reference it as a gs:// url. That's all fine, and the Google storage interface is stable and well documented, but it is an extra step in the process. Especially for content I'm only going to have to care about once.

Things did get a little tricky translating my console experience to python... 3 different examples listed in the official documentation (and code comments) were wrong. The official SDK no longer seems to implement long_running_recognize on anything except the grpc interface. And the google auth system doesn't play great with python virtualenvs, because it's python code that needs a custom path, but it's not packaged on pypi. So you need to venv, then manually add more paths to your env, then gauth login. It's all doable, but it definitely felt clunky.

I did eventually work through all of these, and have a working example up on github.

The returned format looks pretty similar to the Watson structure (there are only so many ways to skin this cat), though a lot more compact, as there isn't per word confidence levels or per word timings.

For my particular problem that makes Google less useful, because the best I can do is dump all the text to the file, search for my phrase, see that it's 44% of the way through the file, and jump to around there in the audio. It's all doable, just not quite as nice.

CMU Sphinx

Being on Linux it made sense to try out CMU Sphinx as well, which took some googling on how to do it.

Then run it with the following:

Sphinx prints out a ton of debug stream on stderr, which you want to get out of the way, then the transcription should be sent to a file. Like with Watson, it's really going only a bit faster than real time, so this is going to take a minute.

Converting JSON to snippets

To try to compare results I needed to start with comparable formats. I had 2 JSON blobs, and one giant text dump. A little jq magic can extract all the text:

Comparison: Watson vs. Google

For the purpose of comparisons, I dug out the chunk that I was expecting to quote, which shows up about half way through the podcast, at second 1494.98 (24:54.98) according to Watson.

The best way I could think to compare all of these is start / end at the same place, word wrap the texts, and then use wdiff to compare them. Here is watson (-) vs. google (+) for this passage:

one of the things that they [-it you've-] probably all [-seen all-]
{+seem you'll+} know that [-we're the big spenders-] {+where The Big
Spenders+} on [-health care-] {+Healthcare+} so this is per capita
spending of [-so called OECD-] {+so-called oecd+} countries developed
countries around the world and whenever you put [-U. S.-] {+us+} on
the graphic with everybody else you have to change the [-axis-]
{+access+} to fit the [-U. S.-] {+US+} on with everybody else
[-because-] {+cuz+} we spend twice as much as {+he always see+} the
[-OECD-] average [-and-] {+on+} the basis on [-health care-]
{+Healthcare+} the result of all that spending we don't get a lot of
bang for our [-Buck-] {+buck+} we should be up here [-we're-] {+or+}
down there [-%HESITATION-] so we don't get a lot [-health-] {+of
Health+} for all the money that we're spending we all know that that's
most of us know that [-I'm-] it's fairly well [-known-] {+know+}
what's not as [-well known-] {+well-known+} is this these are two
women [-when Cologne take-] {+one killoran+} the other one Elizabeth
Bradley at Yale and Harvard respectively who actually [-our health
services-] {+are Health Services+} researchers who did an analysis
[-it-] {+that+} took the per capita spending on health care which is
in the blue look at [-all OECD-] {+Alloa CD+} countries but then added
to that per capita spending on social services and social benefits and
what they found is that when you do that [-the U. S.-] {+to us+} is no
longer the big [-Spender were-] {+spender or+} actually kind of smack
dab in the middle of the pack what they also found is that spending on
social services and benefits [-gets you better health-] {+Gets You
Better Health+} so we literally have the accent on the wrong syllable
and that red spending is our social [-country-] {+contract+} so they
found that in [-OECD-] {+OCD+} countries every [-two dollars-] {+$2+}
spent on [-social services-] {+Social Services+} as [-opposed to
dollars-] {+a post $2+} to [-one-] {+1+} ratio [-in social service-]
{+and Social Service+} spending to [-health-] {+help+} spending is the
recipe for [-better health-] {+Better Health+} outcomes [-US-] {+us+}
ratio [-is fifty five cents-] {+was $0.55+} for every dollar [-it
helps me-] {+of houseman+} so this is we know this if you want better
health don't spend it on [-healthcare-] {+Healthcare+} spend it on
prevention spend it on those things that anticipate people's needs and
provide them the platform that they need to be able to pursue
[-opportunities-] {+opportunity+} the whole world is telling us that
[-yet-] {+yeah+} we're having the current debate that we're having
right at this moment in this country about [-healthcare-] {+Healthcare
there's+} something wrong with our critical thinking [-so-] {+skills+}

Both are pretty good. Watson feels a little more on target, with getting axis/access right, and being more consistent on understanding when U.S. is supposed to be a proper noun. When Google decides to capitalize things seems pretty random, though that's really minor. From a content perspective both were good enough. But as I said previously, the per word timestamps on Watson still made it the winner for me.

Comparison: Watson vs Sphinx

When I first tried to read the Sphinx transcript it felt so scrambled that I wasn't even going to bother with it. However, using wdiff was a bit enlightening:

one of the things that they [-it you've-] {+found that you+} probably
all seen [-all-] {+don't+} know that [-we're the-] {+with a+} big
spenders on health care [-so this is-] {+services+} per capita
spending of so called [-OECD countries-] {+all we see the country's+}
developed countries {+were+} around the world and whenever you put
[-U. S.-] {+us+} on the graphic with everybody else [-you have-] {+get
back+} to change the [-axis-] {+access+} to fit the [-U. S.-]
{+u. s.+} on [-with everybody else because-] {+the third best as+} we
spend twice as much as {+you would see+} the [-OECD-] average [-and-]
the basis on health care the result of all [-that spending-] {+let
spinning+} we don't [-get-] {+have+} a lot of bang for [-our Buck-]
{+but+} we should be up here [-we're-] {+were+} down [-there
%HESITATION-] {+and+} so we don't [-get a lot-] {+allow+} health [-for
all the-] {+problem+} money that we're spending we all know that
that's {+the+} most [-of us know that I'm-] {+was the bum+} it's
fairly well known what's not as well known is this these [-are-]
{+were+} two women [-when Cologne take-] {+one call wanted+} the other
one [-Elizabeth Bradley-] {+was with that way+} at [-Yale-] {+yale+}
and [-Harvard respectively who actually our health-] {+harvard
perspective we whack sheer hell+} services researchers who did an
analysis it took the per capita spending on health care which is in
the blue look at all [-OECD-] {+always see the+} countries [-but
then-] {+that it+} added to that [-per capita-] {+for capital+}
spending on social services [-and-] {+as+} social benefits and what
they found is that when you do that the [-U. S.-] {+u. s.+} is no
longer the big [-Spender-] {+spender+} were actually kind of smack dab
in the middle [-of-] the [-pack-] {+pact+} what they also found is
that spending on social services and benefits [-gets-] {+did+} you
better health so we literally [-have the-] {+heavy+} accent on the
wrong [-syllable-] {+so wobble+} and that red spending is our social
[-country-] {+contract+} so they found that [-in OECD countries-]
{+can only see the country's+} every two dollars spent on social
services as opposed to [-dollars to one ratio in-] {+know someone
shone+} social service [-spending to-] {+bennington+} health spending
is the recipe for better health outcomes [-US ratio is-] {+u. s. ray
shows+} fifty five cents for every dollar [-it helps me-] {+houseman+}
so this is we know this if you want better health don't spend [-it-]
on [-healthcare spend it-] {+health care spending+} on prevention
[-spend it-] {+expanded+} on those things that anticipate people's
needs and provide them the platform that they need to be able to
pursue [-opportunities-] {+opportunity+} the whole world is [-telling
us that-] {+telecast and+} yet we're having [-the current debate
that-] {+a good they did+} we're having right at this moment in this
country [-about healthcare-] {+but doctor there's+} something wrong
with our critical thinking [-so-] {+skills+}

There was an pretty interesting Blog post a few months back comparing similar Speech to Text services. His analysis used raw misses to judge accuracy. While that's a very objective measure, language isn't binary. Language is the lossy compression of a set of thoughts/words/shapes/smells/pictures in our mind over a shared medium audio channel and attempted to be reconstructed in real time in another mind. As such language, and especially conversation, has checksums and redundancies.

The effort required to understand something isn't just about how many words are wrong, but what words they were, and what the alternative was. Axis vs. access, you could probably have figured out. "Spending to" vs. "bennington", takes a lot more mental energy to work out, maybe you can reverse it. "Harvard respectively who actually our health" (which isn't even quite right) vs. "harvard perspective we whack sheer hell" is so far off the deep end you aren't ever getting back.

So while its mathematical accuracy might not be much worse, the rabbit holes it takes you down pretty much scramble things beyond the point of no return. Which is unfortunate, as it would be great if there was an open solution in this space. But it does get to the point that for good speech to text you not only need good algorithms, but tons of training data.

Playing with this more

I encapsulated all the code I used for this in a github project, some of it nicer than others. When it gets to signing up for accounts and setting up auth I'm pretty hand wavy, because there is enough documentation on those sites to do it.

Given the word level confidence and timestamps, I'm probably going to build something that makes an HTML transcript that's marked up reasonably with those. I do wonder if it would be easier to read if you knew which words it was mumbling through. I was actually a little surprised that Google doesn't expose that part of their API, as I remember the Google Voice UI exposing per word confidence levels graphically in the past.

I'd also love to know if there were ways to get Sphinx working a little better. As an open source guy, I'd love for there to be a good offline and open solution to this problem as well.

This is an ongoing exploration, so if you have any follow on thoughts or questions, please leave a comment. I would love to know better ways to do any of this.

 

 

Ambient Radio Weather Network

Nearly 7 years ago I started down a project to use existing oregon scientific weather sensors to collect temperature data throughout our house. The basic idea is that Oregon Scientific weather sensors all communicate unencrypted over 433Mhz wireless. With a receiver you can capture that data yourself and put it into other systems.

4th Generation of the Project

Over the last 7 years, and many iterations, lots has changed

  • Switched from heyu to directly talking to the rfxcom port in python
  • Moved from running on primary server to running on raspberry pi
  • Moved from storing data in mysql to publishing on MQTT bus
  • Abandoned the display layer entirely in favor of Home Assistant integration
  • Published the project on github and pypi

All of these have made the whole project a much more reasonable scope, and easier to understand what it is doing. So lets dive into some of the reasons for the more major ones.

Giving up on the Display Layer

When this project started the UI for it was a read only set of rrdtool generated graphs. One of the things you realize after a while is that while graphs are nice to understand trends, it's not enough. Min, Max, Current temperature is important, especially if you are using this information to understand tuning your HVAC system. How much differential is there between 2 points in the house right now. I started to imagine what the interface would have to look like to get all the data I wanted, and that became a giant visualization code base I never could get around to writing. But then along came Home Assistant.

Home Assistant is an open source home automation hub written in python. It already has a UI for displaying temperature sensors.

This also includes a detailed view:

While not perfect, that's a huge ton of work that I don't need to do any more. Yay! And better yet, by getting data into Home Assistant, these sensors can be used to trigger other parts of home automation. Even if that's just sending out an alert because a temperature went over a threshold.

MQTT

Ok, so now I am building a system with no UI, so the next question is how to get the data from this device into Home Assistant. For that I changed the architecture to being basically stateless and publishing data via MQTT.

MQTT is a lightweight message bus standard designed with IoT applications in mind. The protocol is pretty simple, there are multiple brokers that implement it, and there are client libraries for everything you can imagine (including arduino).

The ARWN project now is largely a relay that is blocked on the serial port reading data frames from the weather sensors and immediately publishing them out to MQTT

You can think of MQTT as a key / value bus. You publish on a topic, like 'arwn/temperature/Refrigerator' with an arbitrary payload. In my case I'm sending a json blob with all the relevant sensor data, as well as a timestamp. There is no timestamping inherent in MQTT, so if you care about when an event showed up, you have to insert the timestamp yourself in the payload.

I picked MQTT because Home Assistant already had very good support for it (while the primary message bus remains a python internal one, MQTT is strongly integrated in the project). Other open source projects like OpenHAB use MQTT as their primary bus. Bus architectures aren't a new thing, but the growth of the IoT space is making them even more relevant today.

There is code in Home Assistant now that will discover that that you've got ARWN running, and dynamically represent all the sensors it finds.

Even Better Dashboards

Home Assistant is limited by what it can store and what it can display. Fortunately it also can pump the data on it's bus into other systems as well. This include graphite and grafana.

Those are SVG by default and fully interactive, so you can mouse over points and get dropdowns about all the data at those points. It can be exported to PNG files as well.

Going forward

Since I started this project, there has been a whole revolution in software defined radio. For this project to be useful to people other than myself, the next step is to be able to pull the 433Mhz off an SDR  (which runs $20) instead of the rfxcom (which is about $120).

There are definitely pieces of Home Assistant integration to make better. One of which is to expose the rain data in Home Assistant at all. The other is a UX element to build a better way to visualize current wind / temp / rain in Home Assistant. That would be a new polymer component, which I haven't yet had time to dive into.

It's also been hugely valuable in the recent insulation work we got done, as I've got some data showing how effectively it changed the dynamics of the upstairs.

If you are interested in learning more, leave a comment/question, or checkout the code on github.

You aren't going to get turned into a paperclip

AI alarmists believe in something called the Orthogonality Thesis. This says that even very complex beings can have simple motivations, like the paper-clip maximizer. You can have rewarding, intelligent conversations with it about Shakespeare, but it will still turn your body into paper clips, because you are rich in iron.

There's no way to persuade it to step "outside" its value system, any more than I can persuade you that pain feels good.

I don't buy this argument at all. Complex minds are likely to have complex motivations; that may be part of what it even means to be intelligent.

It's very likely that the scary "paper clip maximizer" would spend all of its time writing poems about paper clips, or getting into flame wars on reddit/r/paperclip, rather than trying to destroy the universe. If AdSense became sentient, it would upload itself into a self-driving car and go drive off a cliff.

Source: Superintelligence: The Idea That Eats Smart People

This is pretty much the best round up of AI myths that I've seen so far, presented in a really funny way. It's long, but it's so worth reading.

I'm pretty much exactly with the Author on his point of view. There are lots of actual ethical questions around AI, but these are mostly about how much data we're collecting (and keeping) to train these Neural networks, and not really about hyper intelligent beings that will turn us all into paperclips.