Salt Multi-Master Bug in 2014.7

A word of warning about the Salt 2014.7 series if you run in multi-master mode. This past week, I tried rolling out Salt 2014.7.1 (aka Heliuim) to our production environment at work. The 2014.7 line has a lot of exciting new features and fixes, so we’ve been eager to get it out. Having been bitten by bugs in the past, though, we wanted to wait for the first point release to land. That recently dropped, and after a few days kicking the tires in our test environment I was confident that the upgrade would go smoothly

Sadly, it was not to be. We run a check in our Zabbix monitoring system periodically to make sure that every master is able to make a simple connection to every minion. Our production setup uses four masters in multi-master mode for redundancy. Shortly after completing the upgrade of all masters and minions, this check began to fail. Not in a very consistent way, but each master had a different subset of minions it could not reach. Normally when a minion loses touch with a master, it’s fixable by simply restarting the salt-minion service, but that did not work here. It just turned into a game of whack-a-mole where the minion would become unreachable from a different master instead.

Diving into the Salt issue tracker, I came upon issues 18322 and 19932 which were filed against 2014.7 and sounded very familiar. They both indicate minions failing to respond to commands from masters, seemingly at random. The common thread was use of multi-master mode. One user suggested a workaround of setting multiprocessing: False on the minions. I found that improved matters–fewer minions were randomly failing to respond–but did not fix it completely. It also seems that this issue is fixed in the upcoming 2015.2 “Lithium” release, but that is a long ways off and the fix is not easily backported.

Multi-master mode explained the nagging question of why testing had gone off without a hitch. Our test environment only uses one master, and would not have triggered the bug(s). So, shame on me for that! It’s a best practice to test in a configuration as close to production as possible, and I will be fixing that soon. In a virtualized world, there’s minimal cost to spinning up a second master. And the time spent will certainly be less than I spent chasing down this bug.

In the end, I was forced to roll back the upgrade from all of our servers, which was not a fun job. But at the end of it, Salt was running smoothly once again and our monitoring system was clear. I’ll continue to work with the SaltStack team to find a fix. They’re great folks and very committed to both their product and the community, so I am confident it will happen sooner rather than later.

Please leave a comment if you’ve encountered this issue, or have a workaround.

Introduction to Salt-cloud (Part 2)

In part 1 of this series, we got a 10,000 foot view of salt-cloud. What it is, why you might want to use it, and the pieces that make it up. Now, it’s time to get our hands dirty and boot some VM’s.

The salt-cloud Command

Once you’ve installed the appropriate packages for your operating system, you should have the salt-cloud utility available. This CLI app is your interface to salt-cloud. For some examples of what it can do, check out the abridged version of the help output below (from salt-cloud 2014.7.1 on OS X):

jhenry:~ jhenry$ salt-cloud -h
Usage: salt-cloud

  -c CONFIG_DIR, --config-dir=CONFIG_DIR
                        Pass in an alternative configuration directory.
                        Default: /etc/salt

  Execution Options:
    -p PROFILE, --profile=PROFILE
                        Create an instance using the specified profile.
    -m MAP, --map=MAP   Specify a cloud map file to use for deployment. This
                        option may be used alone, or in conjunction with -Q,
                        -F, -S or -d.
    -d, --destroy       Destroy the specified instance(s).
    -P, --parallel      Build all of the specified instances in parallel.
    -u, --update-bootstrap
                        Update salt-bootstrap to the latest develop version on

  Query Options:
    -Q, --query         Execute a query and return some information about the
                        nodes running on configured cloud providers
    -F, --full-query    Execute a query and return all information about the
                        nodes running on configured cloud providers
    --list-providers    Display a list of configured providers.

  Cloud Providers Listings:
                        Display a list of locations available in configured
                        cloud providers. Pass the cloud provider that
                        available locations are desired on, aka "linode", or
                        pass "all" to list locations for all configured cloud
                        Display a list of images available in configured cloud
                        providers. Pass the cloud provider that available
                        images are desired on, aka "linode", or pass "all" to
                        list images for all configured cloud providers
                        Display a list of sizes available in configured cloud
                        providers. Pass the cloud provider that available
                        sizes are desired on, aka "AWS", or pass "all" to list
                        sizes for all configured cloud providers

I’ve trimmed out some poorly documented options to focus on what we’ll use in this post (dumpster diving through the source code to determine what some of those options do may turn into a future article).

As you can see, most salt-cloud actions require either a profile or a map (remember those from part 1?) to execute. Given nothing but a profile (-p) or map (-m), salt-cloud will attempt to boot the named instance(s) in the associated provider’s cloud. Paired with destroy (-d), it will–wait for it–terminate the instance. With -Q or -F, it will query the provider for running instances that match the profile or map and return information about their state. The final set of --list options may be used to view the various regions, images and instance sizes available from a given provider. Handy if you regularly work with several different vendors and can’t keep them all straight.

Configuring a Provider

Time for some concrete examples. Let’s set up Amazon EC2 as a salt-cloud provider, using a config very much like the one that booted the instance where my blog lives.

  id: 'Your IAM ID'
  key: 'Your IAM key'
  keyname: centos
  private_key: ~/.ssh/centos.pem
  securitygroup: www
  provider: ec2
  del_root_vol_on_destroy: True
  del_all_vols_on_destroy: True

I’ve stripped out a couple advanced options, but that’s the gist. It’s plain YAML syntax, like all Salt config. To break it down:

ec2-dealwithit: This is an arbitrary ID that serves as the name of your provider. You’ll reference this in other configs, such as profiles (see next section).

id and key: your AWS credentials, specifically an IAM id:key pair. Pretty self explanatory.

keyname and private_key: The name of an SSH keypair you have previously configured at EC2, and the local path to the private key for that same keypair. This is what allows salt-cloud to log into your freshly booted instance and perform some bootstrapping.

securitygroup: controls which security group (sort of a simple edge firewall, if you are not familiar with EC2) your instances should automatically join.

provider maps to one of salt-cloud’s supported cloud vendors, so it knows which API to speak.

del_root_vol_on_destroy and del_all_vols_on_destroy: determine what should happen to any EBS volumes created alongside your instances. In my case, I want them cleaned up when my instances die so I don’t end up paying for them forever. But YMMV, be sure you’re not going to be storing any critical data on these volumes before you configure them to self-destruct! Confusingly, you need to specify both if you want all EBS volumes to be destroyed. Some instances, such as the newer t2.micro, automatically create an EBS root volume on boot. Setting del_all_vols does not destroy this volume. It only destroys any others you may later attach. So again, consider the behavior you want and set these appropriately. The default behavior depends on which AMI you’re using for your instance, so it’s best to set these explicitly.

Configuring a Profile

Armed with your provider config, it’s time to create a profile. This builds on the provider and describes the details of an individual VM.

  provider: ec2-dealwithit
  image: ami-96a818fe
  size: t2.micro
    - centos
  location: us-east-1
  availability_zone: us-east-1b
    - DeviceName: /dev/sda1
      Ebs.VolumeSize: 30
      Ebs.VolumeType: gp2

Once again, a fairly straightforward YAML file.

ec2-www: An arbitrary identifier used to reference your profile in other configs or from the CLI.

provider: The name of a provider you’ve previously defined in /etc/salt/cloud.profiles.d/. In this case, the one we just set up earlier.

image: An AMI image ID which will be the basis for your VM.

size: The size or “flavor” for your instance. You can print a list of available sizes for a given provider with a command like this: salt-cloud --list-sizes ec2-dealwithit

ssh_username: The user that the salt-bootstrap code should use to connect to your instance, using the SSH keypair you defined earlier in the provider config. This is baked into your AMI image. If you work with several images that use different default users, you can list them all and salt-cloud will try them one by one.

Location and availability_zone: The region and AZ where your instance will live (if you care). You can print a list of locations for a provider with salt-cloud --list-locations ec2-dealwithit.

block_device_mappings: Create or modify an EBS volume to attach to your instance. In my case, I’m using a t2.micro instance which comes with a very small (~6GB) root volume. The AWS free tier allows up to 30GB of EBS storage for free, so I opted to resize the disk to take advantage of that. I also used the gp2 (standard SSD) volume type for better performance. You can map as many EBS volumes as you like, or leave it off entirely if it’s not relevant to you.

Configuring a Map

The final config file–which is optional–that I want to touch on is a map. Remember, a map lays out multiple instances belonging to one or more profiles, allowing you to boot a full application stack with one command. Here’s a quick example:

  - web1
  - web2
  - staging:

ec2-www: This is the name of a profile that you’ve previously defined. Here, I’m using the ec2-www profile that we created above.

web1, web2, ...: These are the names of individual instances that will be booted based on the parent profile.

staging: Here, I’m defining an instance and overriding some default settings. Because I can! Specifically, I changed the minion config that salt-bootstrap will drop onto the newly booted host in /etc/salt/mimion. For example, you could set up a staging server where you test code before deploying it fully. This server might be pointed at a different salt-master to keep it segregated from production. Nearly any setting from the Core, Provider and Profile level can be overwritten to suit your needs.

Making It Rain

Ok, I had to get one bad cloud joke in. Lighten up. Anyway, now that we’ve laid out our config files, we can go about the business of actually managing our cloud(s).

salt-cloud -p ec2-www web1

Boom! You just booted a VM named web1 based on the ec2-www profile we created earlier. If it seems like it’s taking a long time, that’s because the salt-bootstrap deploy script runs on first boot, loading salt onto the new minion for management. Depending on the log level you’ve configured in the core config (/etc/salt/cloud by default), salt-cloud will either sit silently and eventually report success, or spam your console with excruciating detail about its progress. But either way, when it’s done, you’ll get a nice YAML-formatted report about your new VM.

salt-cloud -a reboot web1
[INFO    ] salt-cloud starting
The following virtual machines are set to be actioned with "reboot":

Proceed? [N/y] y
... proceeding
[INFO    ] Complete

In this example, we’re using the -a (action) option to reboot the instance we just created. Salt-cloud loops through all of your providers, querying them for an instance with the name you provide. Once found, it sends the proper API call to the cloud vendor to reboot the instance.

salt-cloud -p ec2-www -d web1
[INFO    ] salt-cloud starting
The following virtual machines are set to be destroyed:

Proceed? [N/y] y
... proceeding
[INFO    ] Destroying in non-parallel mode.
[INFO    ] [{'instanceId': 'i-e7800116', 'currentState': {'code': '48', 'name': 'terminated'}, 'previousState': {'code': '80', 'name': 'stopped'}}]

Now that we’re done playing, I’ve deleted the instance we just booted. Easy come, easy go.

salt-cloud -m /etc/salt/cloud.maps.d/ -P

In this last example, we’re booting the map we created earlier. This should bring up 3 VM’s: web1, web2, and staging. The -P option makes this happen in parallel rather than one at a time. The whole point of working in the cloud is speed, so why wait around?

Wrapping Up

That pretty well covers the basics of salt-cloud. What it is, how to configure it, and how to turn those configs into real, live VM’s at your cloud vendor(s) of choice. There’s certainly more to salt-cloud than what I’ve covered so far. The official docs could also stand some improvement, to put it mildly. So I definitely plan to revisit salt-cloud in future posts. I’m already planning one to talk about deploy scripts such as the default salt-bootstrap.

If you’re wondering “why go to all this trouble writing configs just to boot a dang VM?”, it’s a fair point. But there are reasons! One major benefit of salt-cloud is the way it abstracts away vendor details. You write your configs once, and then use the same CLI syntax to manage your VM’s wherever they may live. It also gives you the advantages of infrastructure as code. You can keep these configs in version control systems like git. You can see at a glance what VM’s should exist, and how they should be configured. It gives you a level of consistency and repeatability you don’t get from ad-hoc work at the command line or a web GUI. These are all basic tenets of good, modern system administration.

I hope that this series was helpful! Please feel free to leave a comment with any questions, corrections or discussion.

Introduction to Salt-cloud (Part 1)

I’ll come right out with it: I’m a big fan of SaltStack–or Salt, for short. Salt is an open-source configuration management and remote execution tool that plays in the same sandbox as products like Puppet, Chef and Ansible. Written in Python, Salt actually started out as a tool purely for remote execution. Think of the infamous “SSH in a for-loop” that every sysadmin has written to automate repetitive tasks, on steroids. Config management was only added later as demand for those features grew. Because of that heritage, Salt has always excelled at orchestration and administration tasks.

One lesser-known member of the Salt family is salt-cloud, a tool for provisioning new VM’s that abstracts away the differences between vendors. This makes it easy to deal with multiple cloud providers without having to stop and learn a new API for each one. Write a short YAML configuration containing your credentials and detailing how many and what type of instances you want to boot, and salt-cloud will make it happen.

This is the first post in a short series on salt-cloud, and assumes some basic familiarity with Salt, such as how to write YAML states and execute simple commands from a CLI. If you need a refresher, the official documentation and tutorials are a great place to start.

Enter Salt-cloud

Salt-cloud is a relative newcomer to the Salt ecosystem, although it has been in development for a couple years now. It started out as a separate project, but was rolled into the main Salt release bundle for version 2014.1, aka “Hydrogen”. Salt-cloud’s humble mission is to take Salt’s config management and execution capabilities and scale them up to managing the instances that make up your cloud infrastructure. Instead of editing files and starting services on individual machines, salt-cloud defines which machines should exist at all, specifies their hardware profile, and lets you boot, reboot or terminate them at will. This takes infrastructure as code to a new level.

Like all Salt tools, salt-cloud runs from a CLI and takes its configuration from simple, YAML-formatted files. This config is made up of “providers”, “profiles”, and optional “maps” and “deploy scripts”. Let’s take a deeper look at each of these components.

Getting Started With Salt-Cloud

To play with salt-cloud, you’ll need a recent build of Salt on your machine. I’m working on Mac OS X, using the excellent Homebrew package manager. So in my case, a simple brew install saltstack was all it took. Several Linux distributions make Salt available out of the box, but it’s typically an ancient version so you will want to use a third-party repo. Ubuntu users can take advantage of SaltStack’s official PPA repo, while RHEL/CentOS folks can get it from EPEL (you may need to enable epel-testing to get the very latest and greatest). salt-cloud has its own package, though it depends on salt-master to function. So you must install both.

Configuring Your Cloud

By default, salt-cloud expects to find config files underneath /etc/salt/, although you can point that anywhere you like with the -c parameter. The Linux packages will create this by default; homebrew does not. Because I’d prefer to be able to edit these configs without constantly running sudo, I chose to mirror them in my home directory. You will need to store sensitive credentials in these files, so do what makes sense for your environment.

mkdir -p ~/salt/{cloud.conf.d,cloud.deploy.d,cloud.maps.d,cloud.profiles.d,cloud.providers.d}

There’s a mouthful. Let’s take a minute to chew.

Config Elements

Core Config contains a handful of top-level settings common to all Providers, Profiles and Maps. This is the place to put your default master and minion configs, and miscellaneous customizations like where salt-cloud should write log files. This is read from /etc/salt/cloud and /etc/salt/cloud.conf.d/*.conf by default.

Providers define top-level settings for a given cloud vendor (Amazon, Digital Ocean, OpenStack, Rackspace, and many more). Things like credentials, security groups, and common settings you want to apply to all VM’s you create at this provider. Any *.conf files underneath cloud.providers.d/ will automatically be parsed by salt-cloud. That pattern continues for the other config elements below.

Profiles are linked to a provider. They define an individual VM, and include settings such as the instance size, which region the VM should boot into, and what image or template it should be based upon.

Maps are an optional feature that let you string together a number of profiles to build a full-blown application stack. Say you’ve defined a small www profile and a second, beefier db profile. With a map, you can ask for three www servers and one db in Amazon US-East-1, with the same in US-West-2, and then have salt-cloud spin the whole bunch up with one command.

Deploy scripts are another optional piece. By default, Salt loads itself onto any cloud VM’s you boot so that you can manage and configure them with no additional work. Which is awesome. This is done using a torturous 5000 line Bash script (seriously!) named salt-bootstrap. If you need functionality that the built-in script does not provide, you can write your own deploy script instead.

Many configuration options can be passed at any of these levels (core, provider, profile, map) which is both a little confusing and very powerful. For example, you can provide a custom minion configuration that all of your VM’s will automatically boot with at the Core level. Which you can then override on an individual basis down in a profile or even a map, if you so choose.

That’s Great, But How Do I Actually Use It?

So, there’s an overview of the pieces that make up salt-cloud. In part 2 of this series, we’ll get into some concrete examples of how to actually write a config and boot your cloud.