Tag Archives: devstack

In praise of systemd

There was definitely a lot of hate around systemd in the Linux community over the last few years as all the Linux distros moved over to it. Things change all the time, and I personally never understood this one. It wasn't like the random collection of just barely working shell scripts that may or may not have implemented "status" was really a technical superior solution.

Recently I've been plumbing more native systemd support into DevStack (OpenStack's development environment setup tool). Personally systemd is just another way to start services, it does a few things better at the cost of having to learn a new interface. But some huge wins in a developer context come from the new logging system with it, journald.

Systemd has it's own internal journaling system, which can reflect out to syslog and friends. However, it's inherently much richer. Journal messages can include not only text, but arbitrary metadata. You can then query the journal with this arbitrary metadata.

A concrete example of this is the request-id in OpenStack. On any inbound REST API call a request-id is generated, then passed around between workers in that service. This lets you, theoretically, trace through multiple systems to see what is going on. In practice, you end up having to build some other system to collect and search these logs. But with some changes to OpenStack's logging subsystem to emit journal native messages, you can do the following:

journalctl REQUEST_ID=req-3e4fa169-80ec-4635-8a7e-b60c16eddbcb

And get the following:

That, is the flow of creating a server in all the Nova processes.

That's the kind of analysis that most people were added Elastic Search into their environments to get. And while this clearly doesn't actually replicate all the very cool things you can do with Elastic Search, it does really give you a bunch of new tools as a developer.

Note: the enablement for everything above is still very much work in progress. The DevStack patches have landed, but are off by default. The oslo.log patch is in the proof of concept phase for review.

This idea of programs being able to send not just log messages, but other metadata around them, is really powerful. It means, for instance, you can always send the code function / line number where a message was emitted. It's not clogging up the log message itself, but if you want it later you can go ask for it. Or even if it is in the log message, by extracting in advance what the critical metadata is, you can give your consumers structured fields instead of relying on things like grok to parse the logs. Which simplifies ingest by things like Elastic Search.

There is a lot to explore here to provide a better developer and operator experience, which is only possible because we have systemd. I for one look forward to exploring the space.

OpenStack as Layers

Last week at LinuxCon I gave a presentation on DevStack which gave me the proper excuse to turn an idea that Dean Troyer floated a year ago about OpenStack Layers into pictures (I highly recommend reading that for background, I won't justify every part of that here again). This abstraction has been something that's actually served us well as we think about projects coming into DevStack.

OpenStack_Layers

Some assumptions are made here in terms of what essential services are here as we build up the model.

Layer 1: Base Compute Infrastructure

We assume that compute infrastructure is the common starting point of minimum functional OpenStack that people are deploying. The output of the last OpenStack User Survey shows that the top 3 deployed services, regardless of type of cloud (Dev/Test, POC, or Production) are Nova / Glance / Keystone. So I don't think this is a huge stretch. There are definitely users that take other slices (like Swift only) but compute seems to be what the majority of people coming to OpenStack seem to be focussed on.

Basic Compute services need 3 services to get running. Nova, Glance, and Keystone. That will give you a stateless compute cloud which is a starting point for many people getting into the space for the first time.

Layer 2: Extended Infrastructure

Once you have a basic bit of compute infrastructure in place, there are some quite common features that you do really need to do more interesting work. These are basically enhancements on the Storage, Networking, or Compute aspects of OpenStack. Looking at the User Survey these are all deployed by people, in various ways, at a pretty high rate.

This is the first place we see new projects integrating into OpenStack. Ironic extends the compute infrastructure to baremetal, and Designate adds a missing piece of the networking side with DNS management as part of your compute creation.

Hopefully nothing all that controversial here.

Layer 3: Optional Enhancements

Now we get a set of currently integrated services that integrate North bound and South bound. Horizon integrates on the North bound APIs for all the services, it requires service further down in layers (it also today integrates with pieces further up that are integrated). Ceilometer consumes South bound parts of OpenStack (notifications) and polls North bound interfaces.

From the user survey Horizon is deployed a ton. Ceilometer, not nearly as much. Part of this is due to how long things have been integrated, but even if you do analysis like take the Cinder / Neutron numbers, delete all the Folsom deploys from it (which is the first time those projects were integrated) you still see a picture where Ceilometer is behind on adoption. Recent mailing list discussions have hints at why, including some of the scaling issues, and a number of alternative communities in this space.

Let's punt on Barbican, because honestly, it's new since we came up with this map, and maybe it's really a layer 2 service.

Layer 4: Consumption Services

I actually don't like this name, but I failed to come up with something better. Layer 4 in Dean's post was "Turtles all the way down", which isn't great describing things either.

This is a set of things which consume other OpenStack services to create new services. Trove is the canonical example, create a database as a service by orchestrating Nova compute instances with mysql installed in them.

The rest of the layer 4 services all fit the same pattern, even Heat. Heat really is about taking the rest of the components in OpenStack and building a super API for their creation. It also includes auto scaling functionality based on this. In the case of all integrated services they need a guest agent to do a piece of their function, which means when testing them in OpenStack we don't get very far with the Cirros minimal guest that we use for Layer 3 and down.

But again, as we look at the user survey we can see deployment of all of these Layer 4 services is lighter again. And this is what you'd expect as you go up these layers. These are all useful services to a set of users, but they aren't all useful to all users.

I'd argue that the confusion around Marconi's place in the OpenStack ecosystem comes with the fact that by analogy it looks and feels like a Layer 4 service like Trove (where a starting point would be allocating computes), but is implemented like a Layer 2 one (straight up raw service expected to be deployed on bare metal out of band). And yet it's not consumable as the Queue service for the other Layer 1 & 2 services.

Leaky Taxonomy

This is not the end all be all of a way to look at OpenStack. However, this layered view of the world confuses people a lot less than the normal view we show them -- the giant spider diagram (aka the mandatory architecture slide for all OpenStack presentations):

OpenStack_Spider_Diagram

This picture is in every deep dive on OpenStack, and scares the crap out of people who think they might want to deploy it. There is no starting point, there is no end point. How do you bite that off in a manageable chunk as the spider grows?

I had one person come up to me after my DevStack talk giving a big thank you. He'd seen a presentation on Cloudstack and OpenStack previously and OpenStack's complexity from the outside so confused him that he'd run away from our community. Explaining this with the layer framing, and showing how you could experiment with this quickly with DevStack cleared away a ton of confusion and fear. And he's going to go dive in now.

Tents and Ecosystems

Today the OpenStack Technical Committee is in charge of deciding the size of the "tent" that is OpenStack. The approach to date has been a big tent philosophy, where anything that's related, and has a REST API, is free to apply to the TC for incubation.

But a big Tent is often detrimental to the ecosystem. A new project's first goal often seems to become incubated, to get the gold star of TC legitimacy that they believe is required to build a successful project. But as we've seen recently a TC star doesn't guarantee success, and honestly, the constraints on being inside the tent are actually pretty high.

And then there is a language question, because OpenStack's stance on everything being in Python is pretty clear. An ecosystem that only exists to spawn incubated projects, and incubated projects only being allowed to be in Python, basically means an ecosystem devoid of interesting software in other languages. That's a situation that I don't think any of us want.

So what if OpenStack were a smaller tent, and not all the layers that are in OpenStack today were part of the integrated release in the future? Projects could be considered a success based on their users and usage out of the ecosystem, and not whether they have a TC gold star. Stackforge wouldn't have some stigma of "not cool enough", it would be the normal place to exist as part of the OpenStack ecosystem.

Mesos is an interesting cloud community that functions like that today. Mesos has a small core framework, and a big ecosystem. The smaller core actually helps grow the ecosystem by not making the ecosystem 2nd class citizens.

I think that everyone that works on OpenStack itself, and all the ecosystem projects, want this whole thing to be successful. We want a future with interoperable, stable, open source cloud fabric as a given. There are lots of thoughts on how we get there, and as no one has ever created a universal open source cloud fabric that lets users have the freedom to move between providers, public and private, so it's no surprise that as a community we haven't figured everything out yet.

But here's another idea into the pool, under the assumption that we are all smarter together with all the ideas on the table, than any of us are on our own.

Bash trick of the week - call stacks

For someone that used to be very vocal about hating shell scripting, I seem to be building more and more tools related to it every day. The latest is caller (from "man bash"):

caller [expr]
Returns the context of any active subroutine call (a shell function or a script executed with the . or source builtins). Without expr, caller displays the line number and source filename of the current subroutine call. If a non-negative inte‐ ger is supplied as expr, caller displays the line number, subroutine name, and source file corresponding to that position in the current execution call stack. This extra information may be used, for example, to print a stack trace. The current frame is frame 0. The return value is 0 unless the shell is not executing a subroutine call or expr does not correspond to a valid position in the call stack.

This means that if your bash code makes heavy use of functions, you can get the call stack back out. This turns out to be really handy for things like writing testing scripts. I recently added some more unit testing to devstack-gate, and used this to make it easy to see what was going on:

The output ends up looking like this:

I never thought I'd know this much bash, and I still think data structure manipulation is bash is craziness, but for imperative programming that's largely a lot of command calls, this works pretty well.

Devstack Vagrant

Devstack is tooling for OpenStack to make it easy to bring up an OpenStack environment based on the latest git trees. It's used extensively in upstream testing, and by many OpenStack developers to set up dev/test environments.

One of my personal challenges in working on Devstack was testing devstack itself. Relying on the upstream gate means we have a limited number of configurations, and when something goes wrong, iterating on a fix is hard. Even more importantly, the upstream gate is currently only a single node test environment.

A month ago I hacked out a new tool - devstack-vagrant (available on github).

DevstackVagrant

Devstack vagrant provides a customized Vagrant environment that will build a 2 node devstack cluster under VirtualBox. The basic model is 2 devstack nodes (a controller and a compute) that bridge through a physical interface on your host. The bridged interface is set as the default route in the nodes so that 2nd level guests created on top of this 2 node devstack can route to the outside world.

The nodes start and build from official Ubuntu 12.04 cloud images, and are customized using the puppet provisioning support in vagrant. There are a few config variables you need to set in a config.yaml, including hostnames, bridge interface, and the password hash you want your stack user to have. Basically enough to bootstrap the environment and then run devstack from upstream git.

I added a bit of additional logic to the end of the normal devstack process that includes installing an Ubuntu 12.04 and Fedora 20 cloud image in your glance, injecting the ssh public key for the stack user into the keyserver, and opening up ping and ssh in all the security groups.

I still consider this an expert tool at this point, as in, if it breaks you get to keep all the pieces. However, this has been useful to me so far, and given the pull requests I got the other day, seemingly is useful for others as well. Patches definitely welcomed. And if it looks like more folks want to contribute I'll happily move to stackforge.

One of the things I'd love to do is sort out a functioning libvirt backend for vagrant (there are 2, they are both a little wonky) because then the 2nd level guests could use nested KVM and not be ridiculously slow.

This tool has already proved very useful to me, so hopefully it will be useful to others as well.