I've written about the various
SSH improvements
in Ansible 2, including a rewrite of the connection plugin.
Unfortunately, the problem that originally motivated the rewrite
currently remains unsolved.
Competing prompts
If you ssh to a host for which your known_hosts file has no entry, you
are shown the host's key fingerprint and are prompted with Are you
sure you want to continue connecting (yes/no)?
. If you run ansible
against multiple unknown hosts, however, the host key prompts will just
stack up:
The authenticity of host 'magpie (a.b.c.d)' can't be established.
ECDSA key fingerprint is 2a:5a:4c:4b:e0:40:de:8b:9b:e6:0f:90:45:68:89:fc.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'hobby (e.f.g.h)' can't be established.
RSA key fingerprint is 61:84:90:47:f7:0f:7b:a2:d5:09:98:6f:bb:3c:50:d9.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'raven (i.j.k.l)' can't be established.
RSA key fingerprint is ab:97:c2:7d:b6:8e:c3:ab:78:a2:20:04:af:9c:6f:2b.
Are you sure you want to continue connecting (yes/no)?
The processes compete for input, so typing “yes” may or may not work:
Please type 'yes' or 'no': yes
Please type 'yes' or 'no': yes
Please type 'yes' or 'no':
Worse still, if some of the targeted hosts are known, output from their
tasks may cause the prompts to scroll off the screen, and ansible will
appear to hang.
Inter-ssh locking
The solution is to acquire a lock before executing ssh and releasing it
once the host key prompt (if any) is negotiated. Ansible 2 had some code
copied from 1.9 to implement this, but it was agonisingly broken. It
wouldn't have always acquired the lock or released it correctly, but the
actual locking was commented out anyway because of lower-level changes,
so it just scanned known_hosts twice for every connection. Even if the
locking had worked, the lock would have been held until ssh exited.
I submitted patches
(12195,
12212,
12236,
12276)
to add a connection locking infrastructure and use it to hold a lock
only until ssh had verified the host key (not until it finished).
Although most of the changes were merged, the actual ssh locking was
rejected because it would (unavoidably)
wait for ssh to timeout
while trying to connect to unreachable hosts.
One of the maintainers recently
said they may reconsider
this (because it's painful to deal with any number of newly provisioned
hosts otherwise), so I have
opened a new PR,
but it has not yet been merged.
Update: The maintainers went with
a different approach
to solve the problem. Instead of using locking inside the connection
plugin, this checks the host key as a separate step at the strategy
level, at the expense of having to parse the known_hosts file to check
if a host's key is already known. I think that's a fragile solution, but
it does eliminate the locking concerns and improve upon the status quo.
Another update: The commit referenced above was reverted later
the same day, for some reason the maintainers did not see fit to record
in the commit message. So we're right back to the broken starting point.