The Advisory Boar

Parallel task execution in Ansible

November 12, 2015

At work, I have a playbook that uses the Ansible ec2 module to provision a number of EC2 instances. The task in question looks something like this:

- name: Set up EC2 instances
  ec2:
    region: "{{ item.region }}"
    instance_type: "{{ item.type }}"
    …
    wait: yes
  with_items: instances
  register: ec2_instances

Later tasks use instance ids and other provisioning data, so each task must wait until it's completed; but provisioning instances can take a long time—up to several minutes for spot instances—so creating a 32-node cluster this way is painfully slow. The obvious solution is to create the instances in parallel.

Ansible will, of course, dispatch tasks to multiple hosts in parallel, but in this case all the tasks must run against localhost. Besides, although each iteration of a loop is executed separately, it's not possible to dispatch them in parallel. Multiple hosts can be made to execute the entire loop in parallel, but it's not possible to hand off one iteration to one host and another to a different host in parallel.

You can get close with “delegate_to: {{item}}”, but each step of the loop will be completed before the next is executed (with Ansible 2, it's possible that a custom strategy plugin could dispatch delegated loop iterations in parallel, but the included free execution strategy doesn't work this way). The solution is to use “fire-and-forget” asynchronous tasks and wait for them to complete:

- name: Set up EC2 instances
  ec2:
    …
    wait: yes
  with_items: instances
  register: ec2_instances
  async: 7200
  poll: 0

- name: Wait for instance creation to complete
  async_status: jid={{ item.ansible_job_id }}
  register: ec2_jobs
  until: ec2_jobs.finished
  retries: 300
  with_items: ec2_instances.results

This will move on immediately from each iteration without waiting for the task to complete, and separately wait for the tasks to complete using async_status. The 7200 and 300 are arbitrary “longer than it could possibly take” choices. Note that we are polling the completion status one by one, so we'll start polling for the completion of iteration #2 only after #1 is complete, no matter how long either task takes. But in this case, since I have to wait for all of the tasks to complete anyway, it doesn't matter.

The Black-browed Reed Warbler that wasn't

November 10, 2015

I am a big fan of written descriptions of field sightings.

Forcing myself to write down my observations and present them in an organised manner has helped me to learn to make better use of however little time I get with a bird in the field. Unless I did this consciously, it was all too easy to spend time looking at birds without seeing very much.

Written descriptions are not always reliable, and the reliable ones not always conclusive. A photograph, regardless of the quality, can often serve to clear up an incomplete description; but photographs can't be the beginning and end of identification, because they come with their own problems.

Here's an old thread from delhibirdpix that shows how photographs can mislead even a succession of expert observers. The ingredients were all in place: a location that has an extraordinary (and well-deserved) reputation for being a vagrant-trap, a small warbler with an unmistakable black brow, and only one species anywhere in the region matching that description.

But two and two did not add up to Black-browed Reed Warbler in this case. The identification hinged entirely on the black brow, which turned out to be an artifact introduced while lightening a dark photograph, as shown in the comparison above. The bird was (probably) a Booted Warbler with an entirely unremarkable pale brow.

Here's another post (from about the same time) about a similar problem.

Strange cryptographic decisions in Ansible vault

November 09, 2015

I wrote about some useful changes to ansible-vault in Ansible 2 in an earlier post. Unfortunately, another significant change to the vault internals was rejected for Ansible 2.

Vault cryptography

The VaultAES256 class implements encryption and decryption. It uses sensible building blocks: PBKDF2 for key generation with a random salt, AES-CTR for encryption, and HMAC-SHA-256 for authentication (used in encrypt-then-mac fashion). This is a major improvement over the earlier VaultAES class, which used homebrew key generation and an SHA-256 digest alone for “verification”.

Nevertheless, the code has some embarrassing oversights. They are not vulnerabilities, but they show that the code was written with… rather less familiarity with cryptography than one might wish:

Plaintext is padded to the AES block size, but this is unnecessary because AES-CTR is used as a stream cipher.
An extra 32-byte block of PBKDF2 (10,000 iterations) output is derived to initialise the 16-byte IV, and the other half discarded; but this is unnecessary because the IV can be 0 (the salt ensures that we do not use the same key to encrypt the same plaintext).

Finally, the ciphertext is passed through hexlify() twice, thereby inflating it to 4x the size (instead of using, say, Base64). This is the least significant and yet the most annoying problem.

The most visible effects of the over-enthusiastic PBKDF2 use were mitigated by a pull request to use an optimised PBKDF2 implementation. This reduced the startup time by an order of magnitude for setups that loaded many vault-encrypted files from group_vars and host_vars.

All of these problems were solved by PR #12130, which saw several rounds of changes and was slated for inclusion in Ansible 2, but was eventually rejected by the maintainers because there wasn't “anyone in-house to review it for security problems and it's late to be adding it for v2”.

Other changes that didn't make it

A couple of other often-requested Vault changes fell by the wayside en route to Ansible 2:

GPG support for the vault was submitted as a PR over a year ago, but the code is now outdated after an initial rebase to the v2 codebase.
Lookup support (with the file lookup plugin, and also with the copy module) was partly implemented but never completed and merged.

Many people left +1 comments on Github to indicate their support for these features. I hope someone wants them enough to work on them for v2.1, and that they have better luck getting this work merged than I did.

My first Wallcreeper

November 09, 2015

Late in December 2009, as a birthday present to myself, I went on a solo trek to Dayara Bugyal, a high-altitude alpine meadow in Garhwal. I meant to write about the week I spent in the mountains, but upon my return, I found the experience too overwhelming to try to describe all at once.

Some six months after the trek, I posted a a photograph from my first campsite. Nearly a year later, I wrote about my decision to forego a field guide on the trek; that's where the paragraph quoted above comes from. It's been nearly five years since then, and I've typed that first sentence a dozen more times, but I never got much further.

One of my most enduring memories of the trip is of a small grey bird crawling up the face of a rock cliff just below Barsu village. I was driving back to Uttarkashi in the late afternoon after the trek, and I caught a flicker of movement on the cliff from the corner of my eye. I knew instantly what it was—a Wallcreeper, a bird I had been hoping to find for the past five years. I had barely a minute to admire it, but I'll never forget the sudden flash of scarlet when it flew away.

It's almost Wallcreeper season where I live now. They're a familiar sight in passage to lower altitudes in early winter, but that first sighting will always be the most precious.

SSH configuration in Ansible 2

November 07, 2015

The ability to use “jump hosts” with Ansible is another often-requested feature. This has been discussed repeatedly on the mailing list and on Stackoverflow, has had a number of howto articles written about it, and multiple independent implementations have been submitted as pull requests to Ansible.

The recommended solution was to set a ProxyCommand in ~/.ssh/config. This meant duplicating inventory data and keeping two sources of connection information in sync. It worked, but grew rapidly less manageable with a larger inventory. Similarly, the ssh_config inventory plugin was a makeshift solution at best.

This post describes the general mechanism provided in Ansible 2 (not yet released) to make SSH configuration changes—including jump hosts—without depending on any data external to Ansible.

SSH configuration

The ssh_args setting in the ssh_connection section of ansible.cfg is a global setting whose contents are prepended to every command-line for ssh/scp/sftp. This behaviour has been retained unmodified for backwards compatibility, but I don't recommend its use, because it overrides the default persistence settings.

In addition to the above, the new ansible_ssh_common_args inventory variable is appended to every command-line for ssh/scp/sftp. This can be set in the inventory (for a group or a host) or in a playbook (for a play, or block, or task). This is the place to configure any ProxyCommand you want to use.

[gatewayed_hosts:vars]
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p someuser@jumphost.example.com"'

In addition to that, the new ansible_ssh_extra_args variable is appended only to command-lines for ssh. There are analogous ansible_scp_extra_args and ansible_sftp_extra_args variables to change scp and sftp command-lines. This allows you to do truly odd things like open a reverse-tunnel to the control node with -R (which is an option only ssh accepts, not scp or sftp).

The --ssh-common-args command-line option is useful when debugging (there's also --ssh-extra-args, --scp-extra-args, and --sftp-extra-args). Note that any values you set on the command-line will be overriden by the inventory or playbook settings described above (which seems backwards, but that's how Ansible handles other command-line options too).

Also note that ansible_user, ansible_host, and ansible_port are now preferred to the old ansible_ssh_* versions.

Internal changes

Once again, the modest user-visible changes are accompanied by major changes internally. The SSH connection plugin was rewritten to be more maintainable, and an entire class of “my connection just hangs” and other bugs (especially around privilege escalation) were fixed in the process.

Understanding sudoers(5) syntax

November 07, 2015

This straightforward guide to configuring sudo is for anyone who didn't expect to see “Don't despair” and a “Quick guide to EBNF” in the sudoers(5) manpage.

Sudo (su "do") allows a system administrator to delegate authority to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while providing an audit trail of the commands and their arguments.

This guide is intended to supplement the manpage. The various environment, security, and logging options are not covered; the explanations in the manpage are easy to follow.

The first 90%

It's possible that the only sudo explanation you will ever need is:

%adm ALL=(ALL) NOPASSWD: ALL

This means “any user in the adm group on any host may run any command as any user without a password”. The first ALL refers to hosts, the second to target users, and the last to allowed commands. A password will be required if you leave out the "NOPASSWD:".

Avoiding asynchronous callback hell in Archiveopteryx

November 06, 2015

I've read many mentions of “callback hell” recently, especially in discussions about Javascript programming. This is the problem of deeply nested code when trying to manage a sequence of asynchronous actions. The many suggested remedies range from splitting the code up into many small named functions to async to promises to futures (and probably other things besides; I haven't tried to keep up).

FutureJS, for example, is described as helping to tame your wild, async code.

I have no opinion about any of these solutions. I don't work with any complex Javascript codebases, and asynchronous actions in my Mojolicious applications have been easy to isolate so far. But I do have opinions about writing asynchronous code, and this post is about why I'm not used to treating it as though it needed “taming”.

Host names and patterns in Ansible 2

November 06, 2015

Nearly lost among the many significant changes in Ansible 2 (not yet released) are a number of related changes to how hostnames and host patterns are handled.

Host patterns

Ansible uses patterns like foo* to target managed nodes; one could match multiple patterns by separating them with colons, semicolons, or commas, e.g., foo*:bar*. The use of colons is now discouraged (and will eventually be deprecated) because of the conflict with IPv6 addresses, and the (undocumented) use of semicolons attracts a deprecation warning. Ansible 2 recommends only the comma: foo*,bar*.

This usage applies to the list of target hosts: for a play, the host pattern argument to the ansible command, and the argument to ansible-playbook --limit.

The groupname[x-y] syntax is no longer supported. Use groupname[0:2] to match the first three hosts in a group. The first host is g[0], the last is g[-1], and g[1:] matches all hosts except g[0].

Inventory hostnames

Ansible 2 requires inventory hostnames to be valid IPv4/IPv6 addresses or hostnames (i.e., x.example.com or x, but not x..example.com or x--). As an extension, it accepts Unicode word characters in hostname labels. Any mistakes result in specific parsing errors, not mysterious failures during execution.

Inventory hostnames may also use alphabetic or numeric ranges to define more than one host. For example, foo[1:3] defines foo1 through foo3, while foo[x:z:2] expands to fox and foz. Addresses may use numeric ranges: 192.0.2.[3:42].

IPv6 addresses

A number of problems with the parsing of IPv6 addresses have also been fixed, and their behaviour has been made consistent across the inventory (.ini files) and in playbooks (e.g., in hosts: lines and with add_host).

All of the recommended IPv6 address notations (from spelling out all 128 bits to the various compressed forms) are supported. Addresses with port numbers must be written as [addr]:port. One can also use hexadecimal ranges to define multiple hosts in inventory files, e.g. 9876::[a:f]:2.

A couple of small but necessary bugfixes go hand-in-hand with the parsing changes, and fix problems with passing IPv6 addresses to ssh and to rsync. Taken together, these changes make it possible to use IPv6 in practice with Ansible.

Bigger on the inside

The changes described above merit only a couple of lines in the 2.0 changelog, but the implementation involved a complete rewrite of the inventory file parser and the address parser. A variety of incidental bugs were fixed along the way.

The upshot is that the code—for the first time—now imposes syntactic requirements on host names, addresses, and patterns in a systematic, documented, testable way.

On Sarah Sharp leaving Linux development

November 05, 2015

A month ago, Sarah Sharp posted to say I'm not a Linux kernel developer any more and I am no longer a part of the Linux kernel community.

Improvements to ansible-vault in Ansible 2

November 03, 2015

ansible-vault is used to encrypt variable definitions, keys, and other sensitive data so that they can be securely accessed from a playbook. Ansible 2 (not yet released) has some useful security improvements to the ansible-vault command-line interface.

Don't write plaintext to disk

Earlier, there was no way to use ansible-vault without writing sensitive plaintext to disk (either by design, or as an editor byproduct). Now one can use “ansible-vault encrypt” and “ansible-vault decrypt” as filters to read plaintext from stdin or write it to stdout using the new --output option.

# Interactive use: stdin → x (like gpg)
$ ansible-vault encrypt --output x

# Non-interactive use, for scripting
$ pwgen -1|ansible-vault encrypt --output newpass

# Decrypt to stdout
$ ansible-vault decrypt vpnc.conf --output -|vpnc -

These changes retain backwards compatibility with earlier invocations of ansible-vault and make it possible to securely automate the creation and use of vault data. In every case, the input or output file can be set to “-” to use stdin or stdout.

A related change: “ansible-vault view” now feeds plaintext to the pager directly on stdin and never writes plaintext to disk. (But “ansible-vault edit” still writes plaintext to disk.)

Automated rekeying

The vault accepts a --vault-password-file option to be specified in order to avoid the interactive password prompt and confirmation.

With Ansible 2, “ansible-vault rekey” accepts a --new-vault-password-file option that behaves the same way, so it's possible to rekey an already-encrypted vault file automatically, if you pass in a script that writes a new vault password to its stdout. (This operation also doesn't leak plaintext to disk.)

An incidental bugfix also makes it possible to pass multiple filenames to ansible-vault subcommands (i.e., it's now possible to encrypt, decrypt, and rekey more than one file at once–this behaviour was documented, but didn't work).

(Unfortunately, many more important vault changes didn't make it to this release.)