The Advisory Boar

By Abhijit Menon-Sen <>

Thoughts on specimen collection


I have now lost count of the number of forwarded copies of the “A scientist found a bird that hadn’t been seen in half a century, then killed it” mail that people have sent me. The collection of a Bougainville Moustached Kingfisher specimen by an AMNH team in Guadalcanal has drawn intense criticism and reignited the debate about whether scientific collection is justified or even necessary.

Moustached Kingfisher [Illustration: J G Keulemans (1842–1912), Novitates Zoologicae]

I don't have anything new or insightful to add to this debate.

As a bird-watcher, of course I prefer live birds to dead ones, so I appreciate and can relate to the passion with which many have argued that we should, above all, do no harm, and that we should be working harder than ever to keep them alive.

As a realist, I must admit that most of what I know about live birds comes in one way or another from dead ones (and the people who killed them), perhaps especially from ones that died long before the ethics of collecting specimens were debated at all.

I am not a conservationist or a biologist, and I have to live with this uneasy equilibrium, lacking enough perspective to pick one side and stick with it, and certainly not knowing enough about any specific incident to either condemn or condone it.

I do know a thing or two about data, however, and I've heard suggestions that we can take some definitive set of measurements from a live bird in lieu of a specimen—or that we should at least be working to develop such an ability—and that once we have it, we won't need specimens any more.

I think we're far from being able to take a complete “snapshot” of a bird good enough to consider as a primary source of data about a new species, but of course we can improve the nature and quantity of information that can be collected from live birds without causing much distress, and I'm sure that would be good news for everyone.

For any given research topic, one could imagine a set of observations that would answer the question without a specimen. For example, the oft-cited research that established a link between DDT and a thinning of egg shells in various raptors could have been done if the eggs had been measured and their thickness recorded at various points in some non-destructive manner.

But at any given time, there will always be some measurements we can't take and questions we can't imagine. In representing a live bird as data, we will always be limited by the current state of the art. Unless we never ask new questions or improve our instruments of measurement, we will always lose something in the summarisation. (Some information is lost even in the current specimen preservation process. I don't think we're in a position to research, say, trends in the heart size of raptors over time.)

We can choose to forego the preservation of a type specimen, and perhaps in future our snapshots will improve to a point where this decision becomes easier, but it will always involve a compromise one way or another.

(Please ignore this link if you don't know what git is)

Handling SSH host key prompts in Ansible

2015-11-25, updated 2015-12-17

I've written about the various SSH improvements in Ansible 2, including a rewrite of the connection plugin. Unfortunately, the problem that originally motivated the rewrite currently remains unsolved.

Competing prompts

If you ssh to a host for which your known_hosts file has no entry, you are shown the host's key fingerprint and are prompted with Are you sure you want to continue connecting (yes/no)?. If you run ansible against multiple unknown hosts, however, the host key prompts will just stack up:

The authenticity of host 'magpie (a.b.c.d)' can't be established.
ECDSA key fingerprint is 2a:5a:4c:4b:e0:40:de:8b:9b:e6:0f:90:45:68:89:fc.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'hobby (e.f.g.h)' can't be established.
RSA key fingerprint is 61:84:90:47:f7:0f:7b:a2:d5:09:98:6f:bb:3c:50:d9.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'raven (i.j.k.l)' can't be established.
RSA key fingerprint is ab:97:c2:7d:b6:8e:c3:ab:78:a2:20:04:af:9c:6f:2b.
Are you sure you want to continue connecting (yes/no)?

The processes compete for input, so typing “yes” may or may not work:

Please type 'yes' or 'no': yes
Please type 'yes' or 'no': yes
Please type 'yes' or 'no': 

Worse still, if some of the targeted hosts are known, output from their tasks may cause the prompts to scroll off the screen, and ansible will appear to hang.

Inter-ssh locking

The solution is to acquire a lock before executing ssh and releasing it once the host key prompt (if any) is negotiated. Ansible 2 had some code copied from 1.9 to implement this, but it was agonisingly broken. It wouldn't have always acquired the lock or released it correctly, but the actual locking was commented out anyway because of lower-level changes, so it just scanned known_hosts twice for every connection. Even if the locking had worked, the lock would have been held until ssh exited.

I submitted patches (12195, 12212, 12236, 12276) to add a connection locking infrastructure and use it to hold a lock only until ssh had verified the host key (not until it finished). Although most of the changes were merged, the actual ssh locking was rejected because it would (unavoidably) wait for ssh to timeout while trying to connect to unreachable hosts.

One of the maintainers recently said they may reconsider this (because it's painful to deal with any number of newly provisioned hosts otherwise), so I have opened a new PR, but it has not yet been merged.

Update: The maintainers went with a different approach to solve the problem. Instead of using locking inside the connection plugin, this checks the host key as a separate step at the strategy level, at the expense of having to parse the known_hosts file to check if a host's key is already known. I think that's a fragile solution, but it does eliminate the locking concerns and improve upon the status quo.

Another update: The commit referenced above was reverted later the same day, for some reason the maintainers did not see fit to record in the commit message. So we're right back to the broken starting point.

Strange pricing for books on the Amazon marketplace


Lars Svensson's classic “Identification Guide to European Passerines” was first published a few decades ago. It is no longer available from Amazon, but I have been keeping an eye on copies from other sellers on the Amazon marketplace, and I am increasingly puzzled by their proposed prices.

Amazon pricing screenshot

The absurdly high price isn't because the book is new, because there's a used copy for sale at $1847.20. Even the cheapest used copy right now is priced at $458.60, and that's still far more than I can imagine anyone wanting to pay.

The sellers don't look shady at first glance, and many are highly rated over a significant period. Maybe they didn't notice that the book is available elsewhere for twenty-odd euro? But no, it's probably an “algorithm” (note: those are scare quotes) at work.

A crocodile selfie


I was at the UP Bird Festival in Chambal, tempted mostly by the memory of seeing many crocodiles. There were very few crocodiles this time, but we found a medium-sized Mugger Crocodylus porosus sunning itself on the bank towards the end of our trip.

I had promised Hassath that I would take a crocodile selfie but alas, I managed to omit the actual crocodile. The crocodile-shaped object up on the bank is (what else?) a log.

Failed crocodile selfie

The crocodile was there, though, just beyond the edge of the frame.

Actual crocodile

Big brother, the tourist attraction


Every morning, children stream past our house in both directions on their way to school. There are the nearly grown-up, very self-conscious young ladies on the way to the inter-college, dressed in blue and white with neatly plaited and be-ribboned hair. There are groups of brown-and-white children, always squabbling over some snack. There are tiny red-and-blue primary school kids who drift past like tumbleweed—so easily distracted that it's a marvel that they ever make it to school.

And then there are the troublemakers, the wretched blue-and-brown boys who derive entertainment from pinching the valve-caps off our car tyres, or snapping off the occasional windshield wiper. We stuck a webcam in the window overlooking the car to keep an eye on these miscreants. It worked pretty well. A few of the smaller children still write their names on the windows when the car is dusty, but we haven't lost any more valve caps.

Webcam view

But now the webcam has become a local attraction, and we hear children of every colour walking past talking about the “CCTV”, bringing their friends around to point it out, and waving or posing (or dancing!) for the camera. A blue-and-white pair—not yet as serious as their elder sisters–recently made faces at it and ran away horrified but giggling when I replied with a cheerful “Hi”.

Ubiquitous surveillance? What fun!

Alpen Wings ED binoculars


Two years ago, I bought a pair of Alpen Wings 8x42 ED binoculars. These are one of the least expensive mid-range birding binoculars, but a big step up from my earlier Nikon Trailblazer 8x42 at three times the price.

Even so, I didn't expect them to be so much better than anything I had used before. The view is addictively bright and clear, and I use the 2.5m close-focus capabilities much more than I thought I would. The build quality is excellent, the adjustments are smooth and precise, and these binoculars feel reassuringly solid in the hand. The hard carrying case is also welcome.

On paper, the specifications are very similar to the Trailblazer: same magnification, similar field of view, waterproof and fogproof, slightly less eye relief, a bit smaller but a few grams heavier. I expected only a modest improvement in optics and better build quality, but they're in an altogether different league. Two years later, I'm still as happy and impressed with them as I was in the first five minutes.

(I also use an Alpen spotting scope, which I will review someday; suffice it to say that Alpen optics deserve their excellent reputation.)

Enabling SSH pipelining by default in Ansible


While writing about ansible_ssh_pipelining earlier, it occurred to me that pipelining could be made to work with requiretty, thus saving having to edit /etc/sudoers, and even making it possible to use su (which always requires a tty). This would mean pipelining could be enabled by default, for a noticeable performance boost.

Here's a working implementation (see the commit message for gory details) that I've submitted as a PR for Ansible 2. Let's hope it's merged soon.

More control over SSH pipelining in Ansible 2

2015-11-04, updated 2015-11-18

SSH pipelining is an Ansible feature to reduce the number of connections to a host.

Ansible will normally create a temporary directory under ~/.ansible (via ssh), then for each task, copy the module source to the directory (using sftp or scp) and execute the module (ssh again).

With pipelining enabled, Ansible will connect only once per task using ssh to execute python, and write the module source to its stdin. Even with persistent ssh connections enabled, it's a noticeable improvement to make only one ssh connection per task.

Unfortunately, pipelining is disabled by default because it is incompatible with sudo's requiretty setting (or su, which always requires a tty). This is because of a quirk of the Python interpreter, which enters interactive mode automatically when you pipe in data from a (pseudo) tty.

Update 2015-11-18: I've submitted a pull request to make pipelining work with requiretty. The rest of this post still remains true, but if the PR is merged, the underlying problem will just go away.

Pipelining can be enabled globally by setting “pipelining=True” in the ssh section of ansible.cfg, or setting “ANSIBLE_SSH_PIPELINING=1” in the environment.

With Ansible 2 (not yet released), you can also set ansible_ssh_pipelining in the inventory or in a playbook. You can leave it enabled in ansible.cfg, but turn it off for some hosts (where requiretty must remain enabled), or even write a play with pipelining disabled in order to remove requiretty from /etc/sudoers.

- lineinfile:
    dest: /etc/sudoers
    line: 'Defaults requiretty'
    state: absent
  sudo_user: root
      ansible_ssh_pipelining: no

The above lineinfile recipe is simplistic, but it shows that it's now possible to disable requiretty, even if it's by replacing /etc/sudoers altogether.

Note the use of another Ansible 2 feature above: vars can also be set for individual tasks (and blocks), not only plays.

Parallel task execution in Ansible


At work, I have a playbook that uses the Ansible ec2 module to provision a number of EC2 instances. The task in question looks something like this:

- name: Set up EC2 instances
    region: "{{ item.region }}"
    instance_type: "{{ item.type }}"
    wait: yes
  with_items: instances
  register: ec2_instances

Later tasks use instance ids and other provisioning data, so each task must wait until it's completed; but provisioning instances can take a long time—up to several minutes for spot instances—so creating a 32-node cluster this way is painfully slow. The obvious solution is to create the instances in parallel.

Ansible will, of course, dispatch tasks to multiple hosts in parallel, but in this case all the tasks must run against localhost. Besides, although each iteration of a loop is executed separately, it's not possible to dispatch them in parallel. Multiple hosts can be made to execute the entire loop in parallel, but it's not possible to hand off one iteration to one host and another to a different host in parallel.

You can get close with “delegate_to: {{item}}”, but each step of the loop will be completed before the next is executed (with Ansible 2, it's possible that a custom strategy plugin could dispatch delegated loop iterations in parallel, but the included free execution strategy doesn't work this way). The solution is to use “fire-and-forget” asynchronous tasks and wait for them to complete:

- name: Set up EC2 instances
    wait: yes
  with_items: instances
  register: ec2_instances
  async: 7200
  poll: 0

- name: Wait for instance creation to complete
  async_status: jid={{ item.ansible_job_id }}
  register: ec2_jobs
  until: ec2_jobs.finished
  retries: 300
  with_items: ec2_instances.results

This will move on immediately from each iteration without waiting for the task to complete, and separately wait for the tasks to complete using async_status. The 7200 and 300 are arbitrary “longer than it could possibly take” choices. Note that we are polling the completion status one by one, so we'll start polling for the completion of iteration #2 only after #1 is complete, no matter how long either task takes. But in this case, since I have to wait for all of the tasks to complete anyway, it doesn't matter.

The Black-browed Reed Warbler that wasn't


I am a big fan of written descriptions of field sightings.

Forcing myself to write down my observations and present them in an organised manner has helped me to learn to make better use of however little time I get with a bird in the field. Unless I did this consciously, it was all too easy to spend time looking at birds without seeing very much.

Written descriptions are not always reliable, and the reliable ones not always conclusive. A photograph, regardless of the quality, can often serve to clear up an incomplete description; but photographs can't be the beginning and end of identification, because they come with their own problems.

Here's an old thread from delhibirdpix that shows how photographs can mislead even a succession of expert observers. The ingredients were all in place: a location that has an extraordinary (and well-deserved) reputation for being a vagrant-trap, a small warbler with an unmistakable black brow, and only one species anywhere in the region matching that description.

Mystery warbler

But two and two did not add up to Black-browed Reed Warbler in this case. The identification hinged entirely on the black brow, which turned out to be an artifact introduced while lightening a dark photograph, as shown in the comparison above. The bird was (probably) a Booted Warbler with an entirely unremarkable pale brow.

Here's another post (from about the same time) about a similar problem.