Handling SSH host key prompts in Ansible
; updated
I've written about the various SSH improvements in Ansible 2, including a rewrite of the connection plugin. Unfortunately, the problem that originally motivated the rewrite currently remains unsolved.
Competing prompts
If you ssh to a host for which your known_hosts file has no entry, you
are shown the host's key fingerprint and are prompted with Are you
sure you want to continue connecting (yes/no)?
. If you run ansible
against multiple unknown hosts, however, the host key prompts will just
stack up:
The authenticity of host 'magpie (a.b.c.d)' can't be established. ECDSA key fingerprint is 2a:5a:4c:4b:e0:40:de:8b:9b:e6:0f:90:45:68:89:fc. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'hobby (e.f.g.h)' can't be established. RSA key fingerprint is 61:84:90:47:f7:0f:7b:a2:d5:09:98:6f:bb:3c:50:d9. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'raven (i.j.k.l)' can't be established. RSA key fingerprint is ab:97:c2:7d:b6:8e:c3:ab:78:a2:20:04:af:9c:6f:2b. Are you sure you want to continue connecting (yes/no)?
The processes compete for input, so typing “yes” may or may not work:
Please type 'yes' or 'no': yes Please type 'yes' or 'no': yes Please type 'yes' or 'no':
Worse still, if some of the targeted hosts are known, output from their tasks may cause the prompts to scroll off the screen, and ansible will appear to hang.
Inter-ssh locking
The solution is to acquire a lock before executing ssh and releasing it once the host key prompt (if any) is negotiated. Ansible 2 had some code copied from 1.9 to implement this, but it was agonisingly broken. It wouldn't have always acquired the lock or released it correctly, but the actual locking was commented out anyway because of lower-level changes, so it just scanned known_hosts twice for every connection. Even if the locking had worked, the lock would have been held until ssh exited.
I submitted patches (12195, 12212, 12236, 12276) to add a connection locking infrastructure and use it to hold a lock only until ssh had verified the host key (not until it finished). Although most of the changes were merged, the actual ssh locking was rejected because it would (unavoidably) wait for ssh to timeout while trying to connect to unreachable hosts.
One of the maintainers recently said they may reconsider this (because it's painful to deal with any number of newly provisioned hosts otherwise), so I have opened a new PR, but it has not yet been merged.
Update: The maintainers went with a different approach to solve the problem. Instead of using locking inside the connection plugin, this checks the host key as a separate step at the strategy level, at the expense of having to parse the known_hosts file to check if a host's key is already known. I think that's a fragile solution, but it does eliminate the locking concerns and improve upon the status quo.
Another update: The commit referenced above was reverted later the same day, for some reason the maintainers did not see fit to record in the commit message. So we're right back to the broken starting point.