The Advisory Boar

By Abhijit Menon-Sen <>

More control over SSH pipelining in Ansible 2

; updated

SSH pipelining is an Ansible feature to reduce the number of connections to a host.

Ansible will normally create a temporary directory under ~/.ansible (via ssh), then for each task, copy the module source to the directory (using sftp or scp) and execute the module (ssh again).

With pipelining enabled, Ansible will connect only once per task using ssh to execute python, and write the module source to its stdin. Even with persistent ssh connections enabled, it's a noticeable improvement to make only one ssh connection per task.

Unfortunately, pipelining is disabled by default because it is incompatible with sudo's requiretty setting (or su, which always requires a tty). This is because of a quirk of the Python interpreter, which enters interactive mode automatically when you pipe in data from a (pseudo) tty.

Update 2015-11-18: I've submitted a pull request to make pipelining work with requiretty. The rest of this post still remains true, but if the PR is merged, the underlying problem will just go away.

Pipelining can be enabled globally by setting “pipelining=True” in the ssh section of ansible.cfg, or setting “ANSIBLE_SSH_PIPELINING=1” in the environment.

With Ansible 2 (not yet released), you can also set ansible_ssh_pipelining in the inventory or in a playbook. You can leave it enabled in ansible.cfg, but turn it off for some hosts (where requiretty must remain enabled), or even write a play with pipelining disabled in order to remove requiretty from /etc/sudoers.

- lineinfile:
    dest: /etc/sudoers
    line: 'Defaults requiretty'
    state: absent
  sudo_user: root
  vars:
      ansible_ssh_pipelining: no

The above lineinfile recipe is simplistic, but it shows that it's now possible to disable requiretty, even if it's by replacing /etc/sudoers altogether.

Note the use of another Ansible 2 feature above: vars can also be set for individual tasks (and blocks), not only plays.

SSH configuration in Ansible 2

The ability to use “jump hosts” with Ansible is another often-requested feature. This has been discussed repeatedly on the mailing list and on Stackoverflow, has had a number of howto articles written about it, and multiple independent implementations have been submitted as pull requests to Ansible.

The recommended solution was to set a ProxyCommand in ~/.ssh/config. This meant duplicating inventory data and keeping two sources of connection information in sync. It worked, but grew rapidly less manageable with a larger inventory. Similarly, the ssh_config inventory plugin was a makeshift solution at best.

This post describes the general mechanism provided in Ansible 2 (not yet released) to make SSH configuration changes—including jump hosts—without depending on any data external to Ansible.

SSH configuration

The ssh_args setting in the ssh_connection section of ansible.cfg is a global setting whose contents are prepended to every command-line for ssh/scp/sftp. This behaviour has been retained unmodified for backwards compatibility, but I don't recommend its use, because it overrides the default persistence settings.

In addition to the above, the new ansible_ssh_common_args inventory variable is appended to every command-line for ssh/scp/sftp. This can be set in the inventory (for a group or a host) or in a playbook (for a play, or block, or task). This is the place to configure any ProxyCommand you want to use.

[gatewayed_hosts:vars]
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p someuser@jumphost.example.com"'

In addition to that, the new ansible_ssh_extra_args variable is appended only to command-lines for ssh. There are analogous ansible_scp_extra_args and ansible_sftp_extra_args variables to change scp and sftp command-lines. This allows you to do truly odd things like open a reverse-tunnel to the control node with -R (which is an option only ssh accepts, not scp or sftp).

The --ssh-common-args command-line option is useful when debugging (there's also --ssh-extra-args, --scp-extra-args, and --sftp-extra-args). Note that any values you set on the command-line will be overriden by the inventory or playbook settings described above (which seems backwards, but that's how Ansible handles other command-line options too).

Also note that ansible_user, ansible_host, and ansible_port are now preferred to the old ansible_ssh_* versions.

Internal changes

Once again, the modest user-visible changes are accompanied by major changes internally. The SSH connection plugin was rewritten to be more maintainable, and an entire class of “my connection just hangs” and other bugs (especially around privilege escalation) were fixed in the process.

Host names and patterns in Ansible 2

Nearly lost among the many significant changes in Ansible 2 (not yet released) are a number of related changes to how hostnames and host patterns are handled.

Host patterns

Ansible uses patterns like foo* to target managed nodes; one could match multiple patterns by separating them with colons, semicolons, or commas, e.g., foo*:bar*. The use of colons is now discouraged (and will eventually be deprecated) because of the conflict with IPv6 addresses, and the (undocumented) use of semicolons attracts a deprecation warning. Ansible 2 recommends only the comma: foo*,bar*.

This usage applies to the list of target hosts: for a play, the host pattern argument to the ansible command, and the argument to ansible-playbook --limit.

The groupname[x-y] syntax is no longer supported. Use groupname[0:2] to match the first three hosts in a group. The first host is g[0], the last is g[-1], and g[1:] matches all hosts except g[0].

Inventory hostnames

Ansible 2 requires inventory hostnames to be valid IPv4/IPv6 addresses or hostnames (i.e., x.example.com or x, but not x..example.com or x--). As an extension, it accepts Unicode word characters in hostname labels. Any mistakes result in specific parsing errors, not mysterious failures during execution.

Inventory hostnames may also use alphabetic or numeric ranges to define more than one host. For example, foo[1:3] defines foo1 through foo3, while foo[x:z:2] expands to fox and foz. Addresses may use numeric ranges: 192.0.2.[3:42].

IPv6 addresses

A number of problems with the parsing of IPv6 addresses have also been fixed, and their behaviour has been made consistent across the inventory (.ini files) and in playbooks (e.g., in hosts: lines and with add_host).

All of the recommended IPv6 address notations (from spelling out all 128 bits to the various compressed forms) are supported. Addresses with port numbers must be written as [addr]:port. One can also use hexadecimal ranges to define multiple hosts in inventory files, e.g. 9876::[a:f]:2.

A couple of small but necessary bugfixes go hand-in-hand with the parsing changes, and fix problems with passing IPv6 addresses to ssh and to rsync. Taken together, these changes make it possible to use IPv6 in practice with Ansible.

Bigger on the inside

The changes described above merit only a couple of lines in the 2.0 changelog, but the implementation involved a complete rewrite of the inventory file parser and the address parser. A variety of incidental bugs were fixed along the way.

The upshot is that the code—for the first time—now imposes syntactic requirements on host names, addresses, and patterns in a systematic, documented, testable way.

Improvements to ansible-vault in Ansible 2

ansible-vault is used to encrypt variable definitions, keys, and other sensitive data so that they can be securely accessed from a playbook. Ansible 2 (not yet released) has some useful security improvements to the ansible-vault command-line interface.

Don't write plaintext to disk

Earlier, there was no way to use ansible-vault without writing sensitive plaintext to disk (either by design, or as an editor byproduct). Now one can use “ansible-vault encrypt” and “ansible-vault decrypt” as filters to read plaintext from stdin or write it to stdout using the new --output option.

# Interactive use: stdin → x (like gpg)
$ ansible-vault encrypt --output x

# Non-interactive use, for scripting
$ pwgen -1|ansible-vault encrypt --output newpass

# Decrypt to stdout
$ ansible-vault decrypt vpnc.conf --output -|vpnc -

These changes retain backwards compatibility with earlier invocations of ansible-vault and make it possible to securely automate the creation and use of vault data. In every case, the input or output file can be set to “-” to use stdin or stdout.

A related change: “ansible-vault view” now feeds plaintext to the pager directly on stdin and never writes plaintext to disk. (But “ansible-vault edit” still writes plaintext to disk.)

Automated rekeying

The vault accepts a --vault-password-file option to be specified in order to avoid the interactive password prompt and confirmation.

With Ansible 2, “ansible-vault rekey” accepts a --new-vault-password-file option that behaves the same way, so it's possible to rekey an already-encrypted vault file automatically, if you pass in a script that writes a new vault password to its stdout. (This operation also doesn't leak plaintext to disk.)

An incidental bugfix also makes it possible to pass multiple filenames to ansible-vault subcommands (i.e., it's now possible to encrypt, decrypt, and rekey more than one file at once–this behaviour was documented, but didn't work).

(Unfortunately, many more important vault changes didn't make it to this release.)

Use the ‘combine’ filter to merge hashes in Ansible 2

One of the most often-requested features in Ansible was a way to merge hashes. This has been discussed many times on the mailing lists and on IRC and on stackoverflow, and implemented in at least five different pull requests submitted to Ansible, and who knows in how many private filter plugins.

Ansible 2 (currently in β2) finally includes a way to do this: the ‘combine’ filter. The filter documentation has examples of its use, but here's the basic idea:

{'a':1, 'b':2}|combine({'b':3})
    → {'a':1, 'b':3}
{'a':{'x':1}}|combine({'a':{'y':2}}, recursive=True)
    → {'a':{'x':1, 'y':2}}

The “hash_behaviour=merge” configuration setting offers similar (recursive-only) functionality, but it's a global setting, and not convenient to use.

The new combine filter makes it possible to build up hashes using set_fact. Note the use of default({}) to address the possibility that x is not defined.

# x → {'a': 111, 'b': 222, 'c': 333}
- set_fact:
    x: "{{ x|default({})|combine({item.0: item.1}) }}"
  with_together:
    - ['a', 'b', 'c']
    - [111, 222, 333]

Thanks to the union filter, you can do the same with lists. Combining these techniques makes it possible to build up complex data structures dynamically.

# y → [{'a':123}, {'b':456}, {'c':789}]
- set_fact:
    y: "{{ y|default([])|union([{item.0: item.1}]) }}"
  with_together:
    - ['a', 'b', 'c']
    - [111, 222, 333]