The Advisory Boar

By Abhijit Menon-Sen <>

How are Unix pipes implemented?

, updated

This article is about how pipes are implemented the Unix kernel. I was a little disappointed that a recent article titled “How do Unix pipes work?” was not about the internals, and curious enough to go digging in some old sources to try to answer the question.

What are we talking about?

Pipes are “perhaps the single most striking invention in Unix” — a defining characteristic of the Unix philosophy of composing small programs together, and a familiar sight in the Unix shell:

$ echo hello | wc -c
6

This functionality depends on the kernel providing a system call named pipe, which is described by the pipe(7) and pipe(2) manual pages:

Pipes provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe.

A pipe is created using pipe(2), which returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end.

Tracing the execution of the command above shows the creation of the pipe and the flow of data through it from one process to another:

$ strace -qf -e execve,pipe,dup2,read,write \
    sh -c 'echo hello | wc -c'
execve("/bin/sh", ["sh", "-c", "echo hello | wc -c"], …)
pipe([3, 4])                            = 0
[pid 2604795] dup2(4, 1)                = 1
[pid 2604795] write(1, "hello\n", 6)    = 6
[pid 2604796] dup2(3, 0)                = 0
[pid 2604796] execve("/usr/bin/wc", ["wc", "-c"], …)
[pid 2604796] read(0, "hello\n", 16384) = 6
[pid 2604796] write(1, "6\n", 2)        = 2

The parent process calls pipe() to obtain connected fds, one child writes to one fd and another reads the same data from the other fd. (The shell uses dup2 to "rename" fds 3 and 4 to match stdin and stdout.)

Without pipes, the shell would have to write the output from one process to a file and arrange for the next process to read from it, at the expense of some overhead and some disk space — but pipes provide more than just the convenience of avoiding temporary files:

If a process attempts to read from an empty pipe, then read(2) will block until data is available. If a process attempts to write to a full pipe (see below), then write(2) blocks until sufficient data has been read from the pipe to allow the write to complete.

This important property, along with the POSIX requirement that writing up to PIPE_BUF (at least 512) bytes into a pipe must be atomic, allows processes to communicate with each other over a pipe in a way that regular files (which have no such guarantees) could not support.

With a regular file, a process could write all its output and hand it over to the next process, or they could operate in lock-step by using an external signalling mechanism (like a semaphore) to let each other know when they were done reading or writing. Pipes eliminate all this complexity.

What are we looking for?

A little hand-waving helps to imagine how pipes might work. You would need to allocate a memory buffer and some state, have functions to add and remove data from the buffer and some glue to invoke them through read and write operations on file descriptors, and sprinkle some locking on top to implement the special behaviour described above.

We are now equipped to interrogate the kernel source under bright lights to confirm or deny this vague mental model, but always prepared to run into any surprises along the way.

Where do we look?

I'm not sure where my (reprinted) copy of the famous “Lions book” with the source code of Unix 6ed is, but thanks to The Unix Heritage Society, we can start by looking at even older Unix sources online.

(Browsing the TUHS archives is like being in a museum — not only does it provide a rare glimpse at our shared history, it also makes me appreciate the years of effort that went into recovering all of these materials bit by bit from old tapes and printed documentation, and acutely conscious of the parts that are still lost.)

Once we've satisfied our curiosity about the ancient history of pipes, we can take a look at some modern kernels just for the contrast.

Incidentally, pipe is system call number 42 in the sysent[] table in Unix. Coincidence?

Traditional Unix kernels (1970–1974)

I can't find any trace of pipe(2) in PDP-7 Unix from January 1970, nor in First edition Unix from November 1971, nor in the incomplete sources of Second edition Unix from June 1972.

TUHS confirms that Third edition Unix from February 1973 was the first version to include pipes:

The third edition of Unix was the last version with a kernel still written in assembly code, but is the first version to include pipes. For much of 1973, the existing Third Edition was maintained and improved, while the kernel was rewritten in C to become the Fourth Edition of Unix.

Update: A reader on HN was kind enough to dig up a scanned copy of the 1964 memo in which Doug McIlroy proposed the idea of “coupling programs like garden hose”.

McIlroy pipe memo

Brian Kernighan's book, “Unix: A History and a Memoir”, also talks about the history of pipes and mentions this memo “…that hung on the wall in my office at Bell Labs for 30 years”. Here's an interview with McIlroy, and some more history from a 2014 paper by McIlroy:

When Unix came to be, my fascination with coroutines led me to badger its author, Ken Thompson, to allow writes in one process to go not only to devices but also to matching reads in another process. Ken saw that was possible. As a minimalist, though, he wanted every system feature to carry significant weight. Did direct writes between processes offer a really major advantage over writing to a temporary file in one process and then reading it in the other? Not until I made a specific proposal with a catchyname, ‘‘pipe’’, and shell syntax to connect processes via pipes, did Ken finally exclaim, ‘‘I’ll do it!’’

And he did do it. In one feverish evening Ken modified both kernel and shell, and fixed several standard programs so they could take input from standard input (potentially piped) as well as named files. The next day brought a heady explosion of applications. By the end of the week, department secretaries were using pipes to send text-processor output to the printer spooler. Not long after, Ken replaced the original API and shell syntax for pipes with cleaner conventions that have been used ever since.

Unfortunately, the source code for the 3E kernel is no longer available, and although we do have the C source code for a version of the Fourth edition Unix kernel from November 1973, it predates the official release by a few months, and is missing the pipe implementation. It is sad that the origin of such an iconic Unix feature is lost to us, perhaps forever.

We do have the pipe(2) manpages from both releases, so we can start by looking at the documentation from 3E (with certain words underlined “by hand” with a string of literal ^H backspaces followed by underscores!). This proto-pipe(2) written in assembly returned only one file descriptor, but provided the basic functionality that we expect:

The pipe system call creates an I/O mechanism called a pipe. The file descriptor returned can be used in both read and write operations. When the pipe is written, the data is buffered up to 504 bytes at which time the writing process is suspended. A read on the pipe will pick up the buffered data.

By next year, along with a rewrite of the kernel into C, the 4E pipe(2) had assumed its modern form, with a prototype of "pipe(fildes)":

The pipe system call creates an I/O mechanism called a pipe. The file descriptors returned can be used in read and write operations. When the pipe is written using the descriptor returned in r1 (resp. fildes[1]), up to 4096 bytes of data are buffered before the writing process is suspended. A read using the descriptor returned in r0 (resp. fildes[0]) will pick up the data.

It is assumed that after the pipe has been set up, two (or more) cooperating processes (created by subsequent fork calls) will pass data through the pipe with read and write calls.

The shell has a syntax to set up a linear array of processes connected by pipes.

Read calls on an empty pipe (no buffered data) with only one end (all write file descriptors closed) return an end-of-file. Write calls under similar conditions are ignored.

So the earliest surviving pipe implementation is from Fifth Edition Unix in June 1974, but it's nearly identical to what was in the next release, which also added some comments, so we might as well skip 5E too.

Sixth Edition Unix (1975)

As so many others have done in the past, Sixth Edition Unix from May 1975 is where we start reading the Unix code. Sixth Edition sources are much more widely available than earlier versions, thanks largely to the Lions book:

For many years, the Lions book was the only Unix kernel documentation available outside Bell Labs. Although the license of 6th Edition allowed classroom use of the source code, the license of 7th Edition specifically excluded such use, so the book spread through illegal copy machine reproductions.

You can still buy a reprint of the book, featuring cover art with shifty-looking students at a photocopier, but thanks to Warren Toomey (who went on to start TUHS), you can also download a PDF of the 6E source. I can't resist a quick glimpse at the effort that went into making this PDF available:

Over 15 years ago I had produced a replica of the Lions' source code listing since I was unhappy with the quality of my n-th generation copy. There was no TUHS and I had no access to any old source code. But in 1988 I discovered an old 9-track tape being discarded of a PDP11 backup. It was hard to determine what it was running, but it did have an intact /usr/src/ tree of which most of the files were timesamped 1979, even at that time it seemed ancient. So it was either 7th edition or a derivative like PWB, which I believe it was.

I used this as a basis and hand edited the source back into 6th edition form. Some code was completely the same, some required the easy edit of changing the modern += token into the archaic =+. Others needed to just remove casts, while some had to be completely retyped, but not that much.

Now, decades later, we can just read the Sixth Edition source code online at TUHS, from an archive contributed by Dennis Ritchie.

Aside: at first glance, what is most striking about this pre-K&R C code is how narrow it is. It's not often that I can include code listings without extensive editing to fit the relatively narrow body of my web site.

There's an instructive comment near the top of /usr/sys/ken/pipe.c (and yes, there's also a /usr/sys/dmr):

/*
 * Max allowable buffering per pipe.
 * This is also the max size of the
 * file created to implement the pipe.
 * If this size is bigger than 4096,
 * pipes will be implemented in LARG
 * files, which is probably not good.
 */
#define	PIPSIZ	4096

The buffer size has not changed from 4E days, but here we can already see beyond the documented public interface — ancient pipes used a file to provide the backing storage!

As for LARG files, they correspond to the ILARG inode flag used by the "large addressing algorithm" to handle indirect blocks to support larger filesystems. If Ken says it would probably not be good to use them, I'm happy to take his word for it.

Here's the actual pipe system call:

/*
 * The sys-pipe entry.
 * Allocate an inode on the root device.
 * Allocate 2 file structures.
 * Put it all together with flags.
 */
pipe()
{
    register *ip, *rf, *wf;
    int r;

    ip = ialloc(rootdev);
    if(ip == NULL)
        return;
    rf = falloc();
    if(rf == NULL) {
        iput(ip);
        return;
    }
    r = u.u_ar0[R0];
    wf = falloc();
    if(wf == NULL) {
        rf->f_count = 0;
        u.u_ofile[r] = NULL;
        iput(ip);
        return;
    }
    u.u_ar0[R1] = u.u_ar0[R0]; /* wf's fd */
    u.u_ar0[R0] = r;           /* rf's fd */
    wf->f_flag = FWRITE|FPIPE;
    wf->f_inode = ip;
    rf->f_flag = FREAD|FPIPE;
    rf->f_inode = ip;
    ip->i_count = 2;
    ip->i_flag = IACC|IUPD;
    ip->i_mode = IALLOC;
}

The comment is quite clear about what's going on, but the code takes a bit of effort to follow, partly because of how the system call parameters and return values are passed in and out through the per-process “struct user u” and the registers R0 and R1.

We try to allocate an inode on disk using ialloc() and two files in memory using falloc(). If all goes well, we set flags to identify the files as two ends of a pipe, point them at the same inode (whose reference count becomes 2), and mark the inode as modified and in-use. (Note the calls to iput() in the error paths, to decrement the reference count of the new inode.)

pipe() must return the read and write file descriptor numbers via R0/R1. falloc() returns a pointer to a file structure, but also "returns" the fd via u.u_ar0[R0]. So the code saves the read fd in r and assigns the write fd directly from u.u_ar0[R0] after the second call to falloc().

The FPIPE flag that we set during pipe creation controls the behaviour of the rdwr() function in sys2.c, which invokes the custom pipe I/O routines for pipes:

/*
 * common code for read and write calls:
 * check permissions, set base, count, and offset,
 * and switch out to readi, writei, or pipe code.
 */
rdwr(mode)
{
	register *fp, m;

	m = mode;
	fp = getf(u.u_ar0[R0]);
        /* … */

	if(fp->f_flag&FPIPE) {
		if(m==FREAD)
			readp(fp); else
			writep(fp);
	}
        /* … */
}

Next in pipe.c is the readp() function to read from a pipe, but it's easier to follow the implementation by looking at writep() first. Once again, the oddities of the argument passing convention add some complexity, but we can gloss over some of the details.

writep(fp)
{
    register *rp, *ip, c;

    rp = fp;
    ip = rp->f_inode;
    c = u.u_count;

loop:
    /* If all done, return. */

    plock(ip);
    if(c == 0) {
        prele(ip);
        u.u_count = 0;
        return;
    }

    /*
     * If there are not both read and write sides of the
     * pipe active, return error and signal too.
     */

    if(ip->i_count < 2) {
        prele(ip);
        u.u_error = EPIPE;
        psignal(u.u_procp, SIGPIPE);
        return;
    }

    /*
     * If the pipe is full, wait for reads to deplete
     * and truncate it.
     */

    if(ip->i_size1 == PIPSIZ) {
        ip->i_mode =| IWRITE;
        prele(ip);
        sleep(ip+1, PPIPE);
        goto loop;
    }

    /* Write what is possible and loop back. */

    u.u_offset[0] = 0;
    u.u_offset[1] = ip->i_size1;
    u.u_count = min(c, PIPSIZ-u.u_offset[1]);
    c =- u.u_count;
    writei(ip);
    prele(ip);
    if(ip->i_mode&IREAD) {
        ip->i_mode =& ~IREAD;
        wakeup(ip+2);
    }
    goto loop;
}

On entry, we want to write u.u_count bytes into the pipe. First we acquire a lock on the inode (see below for more about plock/prele).

Next, we check the inode's reference count. So long as the two ends of the pipe remain open, the reference count must remain 2. We hold one reference (from rp->f_inode), so if the count is less than 2, it must mean that the reader closed their end. In other words, we're trying to write into a closed pipe, which is an error. 6E Unix is when the EPIPE error code and SIGPIPE signal were first introduced.

Even if the pipe is still open, it may be full. In that case, we release the lock and go to sleep hoping that another process will create space in the pipe by reading from it. When we are woken up, we loop back to the start and try to acquire the lock and start another write cycle.

If the pipe does have free space, we use writei() to write out some data. The inode's i_size1 (which may be 0 for an empty pipe) points to the end of whatever data it already contains. If we have enough data to write, we can fill the pipe from i_size1 up to PIPESIZ. Then we release the lock and try to wake up any process that is waiting to read from the pipe. Finally, we loop back to the start to see if we managed to write out as many bytes as we wanted. If not, we start another write cycle.

The inode's i_mode is normally used to hold r/w/x permissions, but for pipes the IREAD and IWRITE bits are used to indicate that some process is waiting to read from or write to the pipe, respectively. The process sets the flag and calls sleep(), expecting a matching wakeup() to be called by some other process in future.

The real magic happens in sleep() and wakeup(), which are implemented in slp.c, the source of the infamous “You are not expected to understand this” comment. Fortunately, we don't have to understand the code, just take a quick look at some other comments:

/*
 * Give up the processor till a wakeup occurs
 * on chan, at which time the process
 * enters the scheduling queue at priority pri.
 * The most important effect of pri is that when
 * pri<0 a signal cannot disturb the sleep;
 * if pri>=0 signals will be processed.
 * Callers of this routine must be prepared for
 * premature return, and check that the reason for
 * sleeping has gone away.
 */
sleep(chan, pri) /* … */

/*
 * Wake up all processes sleeping on chan.
 */
wakeup(chan) /* … */

A process that calls sleep() for a particular channel can be woken up later by another process calling wakeup() for the same channel. writep() and readp() coordinate their operations through matching pairs of these calls. (Note that pipe.c always uses PPIPE as the priority when calling sleep, so the sleeps are interruptible by signals.)

Now we have everything we need to understand the readp() function:

readp(fp)
int *fp;
{
    register *rp, *ip;

    rp = fp;
    ip = rp->f_inode;

loop:
    /* Very conservative locking. */

    plock(ip);

    /*
     * If the head (read) has caught up with
     * the tail (write), reset both to 0.
     */

    if(rp->f_offset[1] == ip->i_size1) {
        if(rp->f_offset[1] != 0) {
            rp->f_offset[1] = 0;
            ip->i_size1 = 0;
            if(ip->i_mode&IWRITE) {
                ip->i_mode =& ~IWRITE;
                wakeup(ip+1);
            }
        }

        /*
         * If there are not both reader and
         * writer active, return without
         * satisfying read.
         */

        prele(ip);
        if(ip->i_count < 2)
            return;
        ip->i_mode =| IREAD;
        sleep(ip+2, PPIPE);
        goto loop;
    }

    /* Read and return */

    u.u_offset[0] = 0;
    u.u_offset[1] = rp->f_offset[1];
    readi(ip);
    rp->f_offset[1] = u.u_offset[1];
    prele(ip);
}

It may help to read this function from the bottom upwards. The "read and return" branch is the common case, when there is some data in the pipe. If so, we use readi() to read as much data as available from the current read f_offset onwards, and update the offset accordingly.

On a subsequent read, the pipe is empty if the read offset has reached the inode's i_size1. We reset the positions to 0 and try to wake up any process that wants to write into the pipe. We know that writep() goes to sleep on on ip+1 when the pipe is full. Now that the pipe is empty, we can wake it up and allow it to resume its write loop.

If there's nothing to read, readp() can set the IREAD flag and go to sleep on ip+2. We know that writep() will wake it up when it manages to write some data into the pipe.

It's worth a quick look at the comments for readi() and writei() to realise that, other than having to pass in the parameters through “u”, we can treat them as ordinary IO functions that take a file, a position, a memory buffer, and a count of bytes to read or write.

/*
 * Read the file corresponding to
 * the inode pointed at by the argument.
 * The actual read arguments are found
 * in the variables:
 *	u_base		core address for destination
 *	u_offset	byte offset in file
 *	u_count		number of bytes to read
 *	u_segflg	read to kernel/user
 */
readi(aip)
struct inode *aip;
/* … */

/*
 * Write the file corresponding to
 * the inode pointed at by the argument.
 * The actual write arguments are found
 * in the variables:
 *	u_base		core address for source
 *	u_offset	byte offset in file
 *	u_count		number of bytes to write
 *	u_segflg	write to kernel/user
 */
writei(aip)
struct inode *aip;
/* … */

As for the "conservative" locking, readp() and writep() hold a lock on the inode until they finish or yield (i.e., call wakeup). plock() and prele() are straightforward, with another matched set of sleep and wakeup calls to enable us to wakeup any process that wants a lock that we have just released:

/*
 * Lock a pipe.
 * If its already locked, set the WANT bit and sleep.
 */
plock(ip)
int *ip;
{
    register *rp;

    rp = ip;
    while(rp->i_flag&ILOCK) {
        rp->i_flag =| IWANT;
        sleep(rp, PPIPE);
    }
    rp->i_flag =| ILOCK;
}

/*
 * Unlock a pipe.
 * If WANT bit is on, wakeup.
 * This routine is also used to unlock inodes in general.
 */
prele(ip)
int *ip;
{
    register *rp;

    rp = ip;
    rp->i_flag =& ~ILOCK;
    if(rp->i_flag&IWANT) {
        rp->i_flag =& ~IWANT;
        wakeup(rp);
    }
}

I could not understand at first why readp() did not call prele(ip) before calling wakeup(ip+1). The first thing writep() does in its loop is plock(ip), which would deadlock if readp() still held its lock, so the code must have the right effect somehow. A quick glance at wakeup() showed that it only marks sleeping processes as runnable, and leaves it to a future sched() to actually start the process. So readp() calls wakeup(), releases its lock, sets IREAD, and calls sleep(ip+2) all before writep() resumes its loop.

That's all there is to pipes in 6E. Simple code, far-reaching consequences.

Seventh Edition Unix from January 1979 was a major new release (after a gap of four years) with many new applications and kernel features. It also featured widespread changes due to the use of K&R C with typecasts and unions and typed pointers to structs, but the pipe code hardly changed at all. We can skip it without missing anything exciting.

Xv6, a simple Unix-like kernel

Xv6 is a kernel inspired by Sixth Edition Unix, but written in modern C to run on x86 processors. It's simple to read and understand but, unlike the Unix sources from TUHS, you can also compile and modify and run it on something other than a PDP 11/70. For this reason, it is widely used by universities in Operation Systems courses. The source code is available on Github.

It has a clear and concise pipe.c implementation backed by a buffer in memory instead of an inode on disk. I reproduce only the definition of “struct pipe” and the pipealloc() function here:

#define PIPESIZE 512

struct pipe {
  struct spinlock lock;
  char data[PIPESIZE];
  uint nread;     // number of bytes read
  uint nwrite;    // number of bytes written
  int readopen;   // read fd is still open
  int writeopen;  // write fd is still open
};

int
pipealloc(struct file **f0, struct file **f1)
{
  struct pipe *p;

  p = 0;
  *f0 = *f1 = 0;
  if((*f0 = filealloc()) == 0 || (*f1 = filealloc()) == 0)
    goto bad;
  if((p = (struct pipe*)kalloc()) == 0)
    goto bad;
  p->readopen = 1;
  p->writeopen = 1;
  p->nwrite = 0;
  p->nread = 0;
  initlock(&p->lock, "pipe");
  (*f0)->type = FD_PIPE;
  (*f0)->readable = 1;
  (*f0)->writable = 0;
  (*f0)->pipe = p;
  (*f1)->type = FD_PIPE;
  (*f1)->readable = 0;
  (*f1)->writable = 1;
  (*f1)->pipe = p;
  return 0;

 bad:
  if(p)
    kfree((char*)p);
  if(*f0)
    fileclose(*f0);
  if(*f1)
    fileclose(*f1);
  return -1;
}

pipealloc() sets the stage for the rest of the implementation, comprising piperead(), pipewrite(), and pipeclose() functions. The actual system call, sys_pipe, is a wrapper implemented in sysfile.c. I recommend reading the code in its entirety. It's about the same level of complexity as the 6E Unix source code, but much easier to go through, therefore more satisfying.

Linux 0.01

It's possible to find the Linux 0.01 source code, and it's instructive to look at its pipe implementation in fs/pipe.c too. It uses an inode to represent the pipe, but it's written in modern C. Having worked our way through the 6E code, it's not difficult to follow. Here's the write_pipe() function:

int write_pipe(struct m_inode * inode, char * buf, int count)
{
    char * b=buf;

    wake_up(&inode->i_wait);
    if (inode->i_count != 2) { /* no readers */
        current->signal |= (1<<(SIGPIPE-1));
        return -1;
    }
    while (count-->0) {
        while (PIPE_FULL(*inode)) {
            wake_up(&inode->i_wait);
            if (inode->i_count != 2) {
                current->signal |= (1<<(SIGPIPE-1));
                return b-buf;
            }
            sleep_on(&inode->i_wait);
        }
        ((char *)inode->i_size)[PIPE_HEAD(*inode)] =
            get_fs_byte(b++);
        INC_PIPE( PIPE_HEAD(*inode) );
        wake_up(&inode->i_wait);
    }
    wake_up(&inode->i_wait);
    return b-buf;
}

Even without looking up the structure definitions, we can understand the use of the inode reference count to check if a write should raise SIGPIPE. Other than operating byte-by-byte, it's easy to map this function to the ideas we saw earlier. Even the sleep_on/wake_up logic doesn't seem entirely foreign.

Modern kernels: Linux, FreeBSD, NetBSD, OpenBSD

I had a quick look at some modern kernels, and none of them uses a disk-backed implementation any more (not surprisingly). Linux has its own implementation, of course, but although the three modern BSD kernels all have implementations derived from code written by John S. Dyson, they too have diverged from each other over the years.

Reading through fs/pipe.c (on Linux) or sys/kern/sys_pipe.c (on *BSD) now takes real commitment. The code now needs to be concerned with performance and support for features like vectored and asynchronous IO, and the details of memory allocation, locking, and kernel configuration all vary considerably. Not at all the sort of thing a university would want for an introductory course.

Nevertheless, I took some satisfaction in being able to recognise a few ancestral patterns (raising SIGPIPE and returning EPIPE when writing to a closed pipe, for example) in each of these very different modern kernels. I may never see a PDP-11 computer, but there are still useful lessons to learn from code that predates my own existence by a few years.

The Linux Kernel Implementation of Pipes and FIFOs” by Divye Kapoor in 2011 is an overview of how pipefs (still) works in Linux. This recent Linux commit illustrates how pipes support a model of interaction that goes beyond what temporary files allow, as well as how much further pipes have evolved from the "very conservative locking" in the 6E Unix kernel.

What is fsck up to now?

We had unexpectedly heavy snowfall the other day and, as always, the mains power supply came back only after a few days of repairing broken power lines in the forest. Meanwhile, the days were so overcast that the solar inverter couldn't charge the batteries enough to keep up with our minimal domestic load.

Which meant that when the sun came out again, I was left staring at something like this for a long time:

root@soot:~# ps -eo pid,cmd|grep '[f]sck'
756 /lib/systemd/systemd-fsck /dev/mapper/sdb1_crypt
757 /sbin/fsck -a -T -l -M -C4 /dev/mapper/sdb1_crypt
758 /lib/systemd/systemd-fsckd
759 fsck.ext4 -a -C4 /dev/mapper/sdb1_crypt

Long enough, in fact, that I began to wonder if I could tell what it was doing. (The volume in question is exported via iSCSI from a Synology NAS and fsck is still running long after the machine has otherwise finished booting up, so I have ordinary shell access.)

Read more…

More weird Hindi phrases

Not all of the strange Hindi phrases I've encountered can be traced to awkward translations. Here are some that I find baffling all by themselves.

गति से प्रथम सुरक्षा

This is a common sign on the mountain roads in Uttarakhand (right up there with “We are like you, but not your speed”). For many years, I read this—without much conscious thought—as “[Controlling your] Speed is the principal safety measure”. This is something I strongly believe, and it seemed only right and proper to see it written by the roadside.

Then one day, it suddenly occurred to me that it might actually be meant to say “Safety before speed”.

Wiktionary says प्रथम means both first and preeminent, so both interpretations seem within the realm of plausibility. प्रथम is not an obscure word, but it's most often encountered as an adjective in the context of ranking things (like students). I can't find any other uses of it as a preposition, the way “before” is used above.

So how did we come by this odd phrase, and which of the two possible interpretations was intended? Was it an overenthusiastic translation, or did it mean what I always thought it did? Who knows?

I do know, however, that if the sign had said “गति से पहले सुरक्षा”, it would have quite unambiguously meant “Safety before speed”. But पहले is a much more ordinary word than प्रथम, and nobody in the Department of Road Signs ever got a bonus for using an ordinary word where an alternative was available. Especially if that alternative happens to be a word that's not used in Urdu.

Oddly enough, Google Translate agrees with me here. It translates “गति से पहले सुरक्षा” as “Safety before speed”, but suggests “Speed first protection” for the original.

कृपया बैठे हुए कुर्सी की पेटी बांधे रखिए

Leaving road transport behind and taking to the air, the above phrase can be found on a little placard behind every airline seat on Indigo flights (and perhaps some others too). Indigo airline seatbelt label

What does it mean? Well, the English version is quite straightforward: “Please fasten seat belt while seated”. The Hindi version is also quite matter-of-fact. But what it actually says is… “Please keep the belts of seated seats tied”.

I don't know how we ended up with that. Perhaps someone started with “कृपया बैठे हुए यात्री कुर्सी की पेटी बांधे रखें” (“Seated passengers may please keep their seat belts fastened”) and someone tried a little too hard to shrink it to fit? But why change रखें to रखिए in that case? Did someone think it sounded more polite? Or why not just say “कृपया कुर्सी की पेटी बांधे रखें” (“Please keep your seat belt fastened”)? Did someone think the English and Hindi versions should look the same length to avoid giving offence? (For that matter, is it really necessary to say “while seated” even in English?)

I can't resist mentioning another Indigo annoyance here. Their announcements use the terms “Cabin Crew” and “कर्मी दल” (“worker group”), but in the Lead Cabin Attendant's introduction, she refers to herself as “मुख्य कर्मी दल”, which makes exactly as much sense as “chief worker group”.

Renault Duster AWD 2019

Renault India introduced the Duster AWD in late 2014, and Hassath and I bought one just days after it was released. We liked it immediately, and wrote a detailed review after one year. At the time of writing, our car is doing fine after five years of unrelenting offroad use.

Time to upgrade?

The Duster retains a dedicated following, but attracts fewer new buyers with each passing year. Its outstanding ride quality and surprising AWD competence are still unmatched in its segment (the Mahindra Thar is much less usable on-road, and the Jeep Compass AWD costs as much as a Duster and Thar put together); but the AWD market in India was always a niche, and the Duster is now increasingly described as “dated” and lagging its competition in terms of features and interior comforts.

Renault chose not to bring its second-generation 2017 Duster to India, and was content to release the occasional “facelift”, most recently in July 2019. Meanwhile, grim rumours began to circulate about the future of the Duster AWD in India after the April 2020 deadline for adoption of BS6 emission standards (or even Renault's future in India, depending on how grim you wanted to be).

In October 2019, the Duster AWD took top spot in Autocar India's list of “cars to buy before they die”.

Alas, the latest facelift did not endear itself to us. The AWD variant, formerly available only in the top-spec RxZ configuration, was relegated to the “RxS(O)” line and stripped of various features to reduce costs. Some of the differences we didn't care about (cosmetic changes), some we could live without (e.g., the touchscreen, electrically foldable outside rearview mirrors, reverse parking camera), and some we were willing to sacrifice (e.g., cruise control, speed limiter).

But that left us with the infuriating omission of the rear wiper and washer and height-adjustable driver's seat with lumbar support. We need both and were prepared to pay more for them, but neither feature could be retro-fitted onto the RxS(O) car (and we couldn't do without AWD).

Conclusion—keep the 2014 RxZ AWD, don't buy a new one.

RxZ AWD: the last of its kind

At least, that would have been the conclusion if it hadn't been for Hassath. At her urging, someone at the dealership went to consult the inventory, and found a pre-facelift 2019 Duster RxZ AWD at the factory in Chennai. Just one.

At first, I was not a fan of buying an "older car", but Hassath asked me to enumerate specific reasons to avoid it. After much thought, the only objection I could come up with was… “But it's brown”.

And that's how we came to buy the very last pre-facelift 2019 RxZ AWD available anywhere in India. And it's actually quite a lovely brown.

Read more…

Bogus Hindi translations

My annoyance at tortured translations between Hindi and English is not confined to the strange inverted use of until in Hinglish.

Here are some phrases that have suffered terribly in translation in the opposite direction.

कार्य प्रगति पर है

This is the usual translation of “work in progress” on road signs, but प्रगति means “progress” in a larger sense—think “scientific progress”, not “the progress bar is stuck at 95%”. And कार्य is a very grandiose word to apply to construction work, but it's just the sort of Sanskrit-derived word that the powers that be love to slip into official signs as if they were nothing out of the ordinary.

What's worse, “पर है” means “on” in the sense of putting one thing on another. So if you were to translate the sign back into English, “The work is on the progress” wouldn't be too far off. Nowhere else is प्रगति used in a way to suggest that you can put things on it (or in it).

It would be perfectly natural and unambigous to write “काम चल रहा है”, but that's just not officious enough to satisfy anyone.

निजी अस्पताल

Hindi newspapers use this term to mean “private hospital”, but निजी would be better translated as “personal”. It doesn't convey the private vs public sense of being the opposite of a “सरकारी अस्पताल” (Government hospital). When I read about someone going to a निजी अस्पताल, I always imagine them going to their own hospital (which I would too, if I had one).

In this case, I don't know of a better way to say it.

I like volumeicon

My headphone cable has become flaky, so I've been using a small external speaker for a while. I now need to change my volume settings so often that running alsamixer in a terminal each time was beginning to annoy me. So I looked for a systray-based mixer (I use xmobar with xmonad), and I found volumeicon (packaged in Debian as volumeicon-alsa).

I really like volumeicon. The default behaviour is perfectly sensible. Click the icon to mute, click again to unmute. Hover to see the current volume level, use the scroll wheel to change it. Middle click to "Open Mixer", which runs alsamixer in a terminal. Right click to set "Preferences".

What's more, the preferences are also remarkably sensible. You can pick the ALSA device and channel to control, and there's a straightforward menu to change the appearance and behaviour of the icon. volumeicon preferences

It's been quite a while since I encountered a new program that did exactly what I wanted with so little fuss.

Resolving a conflict between manpages-dev and libbsd-dev

I use xmonad, which needs ghc.

On my Debian 9/stretch machine, the manpages-dev package is declared to break libbsd-dev, which ghc depends on. So in practice, one must choose between xmonad and manpages-dev, which has annoyed me for a long time.

Today I tracked down a related Debian bug report and happened to notice that my libbsd-dev 0.8.3 was just one minor revision too old to avoid the conflict declared by manpages-dev 4.10-2:

$ apt-cache show manpages-dev|grep Breaks
Breaks: libbsd-dev (<< 0.8.4-1), manpages (<< 4.13-3)

I also noticed that libbsd-dev 0.9.1-1 packages were available in unstable/sid, and followed the instructions to create a backport of the package to stretch.

$ dget -x http://http.debian.net/debian/pool/main/libb/libbsd/libbsd_0.9.1-1.dsc
$ cd libbsd-0.9.1
$ sudo mk-build-deps --install --remove
$ dch --local ~bpo9+ --distribution stretch-backports "Rebuild for stretch-backports."
$ dpkg-buildpackage -us -uc

I installed the updated libbsd0 and libbsd-dev packages, and was then able to install manpages-dev for the first time in… several months, at least.

I have manpages again!

Restoring a #4 plane

I, too, have an old #4 smoothing plane that my grandfather gave me.

Unlike some of the other tools I inherited from him, this plane is wholly unburdened by any pedigree—or even a manufacturer's name. My grandfather bought it in Calcutta when I was eleven years old. I knew nothing about planes, but I used it happily for a few years. Then I discovered computers, and the plane went into a box.

Now, nearly three decades later, I need a plane again.

This is my story of restoring an old plane to the point where I could use it. If there's anything to take away from this article, it is that, with sufficient effort, even a poorly-manufactured plane may eventually be made to work well. Was it worth the effort? Well, I have a working plane now. I know my grandfather would have been pleased.

Resources

As a first approximation, restoring old tools consists of taking them apart and rubbing the parts against a variety of abrasive substances (sometimes for hours) to remove rust and reshape them until they look and work better.

To learn about plane restoration, I recommend starting with this video tutorial by Paul Sellers, whose focus is to quickly restore a plane to working order. Here's a diagram of bench plane parts; the rest of this article assumes you are familiar with these terms.

General cleanup

The plane wasn't in terrible condition to begin with. Once I wiped off the dust with an oily rag, it took only a few minutes to get rid of a few spots of surface rust on the body with sandpaper. There was some rust on the plane iron too, which I sanded off after soaking briefly in a dilute vinegar solution.

Plane iron

There were two things wrong with the plane iron: it was not very good, and it had been sharpened many times by an eleven-year-old me.

Fixing the damage I had done in the past was not as difficult as I had feared. The bevel was uneven and rounded, there were nicks in the cutting edge, and it wasn't very sharp. Thanks to my diamond sharpening plates, it was just a matter of time and practice to grind and polish a new bevel, relieve the corners, and put a fine camber on the cutting edge.

The back of the iron was nowhere near flat, and the steel was very hard. It took several sessions on the extra-coarse plate to flatten the (last few centimetres of the) back, leaving only two low spots on the corners where the grinding wheel in the factory must have dug in. Removing them would have meant removing a lot of steel, so I stopped there. I did try tapping the back of the iron with a hammer to create a low spot in the centre, so that only the edges would need to be ground flat, but the steel was so hard that it had no noticeable effect.

(Normal sharpening during use subsequently removed enough steel from the cutting edge that the low spots on the back are no longer apparent.)

The leading edge of the cap iron fit poorly against the back of the plane iron. I ground it flat on the diamond plates to eliminate the gaps, and lightly polished the upper surface. I also filed off a sharp exposed point on the thread of the captive screw, which always managed to find its way under my fingernail when I was reattaching the cap iron to the sharpened plane iron.

Lever cap

The lever cap was so poorly machined that it gets its own section here. The cam on the lever was shaped wrong, so that it couldn't be released at all once it was locked down, and I had to loosen the screw on the body to remove the lever cap every time I wanted to sharpen the plane iron. Worse still, the screw didn't hold the lever cap securely—the cap would slide around as I tightened the screw, and end up crooked or off-centred. Even after the screw was tight, the lever cap would sometimes slip off the correct position and come loose.

I flattened the leading edge of the lever cap to improve contact with the cap iron. I tried to file the cam to the correct shape, but I didn't get it right, because it wasn't possible to remove the cam, and access was limited by the flat spring and the body of the lever cap. The best I could do was to make it work "backwards": to lock down in the raised position, and release when lowered. This is rather annoying, but it does seem to work.

As for the other problem, the only solution I have is to hold down the cap firmly when tightening the screw. With the modified cam, at least I don't always need a screwdriver handy to remove the plane iron for sharpening.

The right answer may be to find a used lever cap on eBay.

Body and sole

The sole of the plane was not flat. I put some #150 grit silicon carbide paper on a sheet of float glass and rubbed the plane (with the blade on, but retracted) across it a few times to identify low spots. I had to use the extra coarse diamond plate and #60 grit belt sander paper to flatten the sole just enough to bring heel, toe, throat, and edges into level. I then worked my way back up the grits to polish out the scratches. I also relieved the edges of the sole and heel on the sandpaper.

The face of the frog was not flat. I used #150 grit paper on float glass to knock down some of the high spots, but decided it wasn't worth it to pursue a couple of low spots along the edge. I cleaned up the contact surfaces along the bottom of the frog and on the body too.

The throat opening was not finished evenly, and it had some nicks in it. I used a flat file to remove some of the irregularities, but decided not to remove as much steel as it would take to make the throat straight and square and work out all the nicks. The opening was quite wide to begin with, and the remaining nicks seemed to not matter much in practice.

The wooden parts (tote and front knob) were in good condition, with no cracks or surface damage, so I didn't need to do anything to them.

Things I didn't fix

The sides of the body were not square to the sole. Sandpaper on float glass would have fixed it, but my fingers were threatening to fall off after flattening the sole and the frog, so I didn't bother. I may need to revisit this once I start using the plane on a shooting board.

There's no good way to close down the throat. The opening was wide to begin with, and I had to file it down further to remove irregularities. Advancing the frog causes chatter because the contact points between the frog and body are sketchy. Applying enough abrasives could, in theory, solve this problem, but it's not worth it.

The adjustment knob used to set the depth of cut is a bit awkward to use, partly because the lever cap needs to be cinched down so tightly. Another oddity is that it is on a screw with a normal thread; I believe it's usually reversed, so that rotating the knob clockwise advances the blade. On my plane, it's the other way around. Nothing to do about that but get used to it.

The lateral adjustment lever just doesn't work well. The mechanism has a lot of slop, isn't quite centred, and goes further on one side than the other. I don't know how to fix that.

Conclusion

My plane will never compete with the old Stanley or new Lie-Nielsen #4s of this world, but it does work. I can set it finely enough to take full-width shavings a few hundredths of a millimeter thick, leaving a mirror-smooth finish. Occasionally, I can even plane knotty cypress against the grain without tearing out or leaving gouge marks on the surface. That's far beyond anything I expected from it.

Is it a plane I will use forever? No. But it's a plane I will always remember, and that makes it a pretty special gift.

Relaying from Postfix to gmail

Here's how I configured Postfix to relay mail from x@example.com through smtp-relay.gmail.com:587 using the credentials set up for x@example.com on Google Apps.

There are three parts to this: making Postfix relay mail based on the sender address, teaching it to authenticate to gmail, and configuring gmail to accept the relayed mail. (Postfix was already configured to send outgoing mail directly.)

Sender-dependent relay

I created /etc/postfix/relay_hosts with the following contents:

@example.com smtp-relay.gmail.com:587

Then I ran «postmap /etc/postfix/relay_hosts» and set sender_dependent_relayhost_maps in /etc/postfix/main.cf:

sender_dependent_relayhost_maps =
    hash:/etc/postfix/relay_hosts

SMTP SASL authentication

I created /etc/postfix/sasl_passwords (mode 0600) with the following contents:

smtp-relay.gmail.com x@example.com:xpassword

Then I ran «postmap /etc/postfix/sasl_passwords» and added the following to /etc/postfix/main.cf:

smtp_sasl_auth_enable = yes
smtp_sasl_password_maps =
    hash:/etc/postfix/maps/relay_passwords
smtp_sasl_security_options = noanonymous

That enables SMTP AUTH in the Postfix SMTP client and tells Postfix where to look up the username and password for a domain.

Gmail will accept SMTP AUTH only in a TLS session, so TLS client support must be configured in Postfix (which means setting smtp_tls_security_level to "may"). But even once that's done, gmail advertises only the following authentication mechanisms:

250-AUTH LOGIN PLAIN XOAUTH2 PLAIN-CLIENTTOKEN OAUTHBEARER XOAUTH

I didn't want to worry about OAUTH, so I was left with PLAIN was the only reasonable choice. Postfix will not use plaintext authentication mechanisms by default, so I also had to remove "noplaintext" from the default value for smtp_sasl_security_options.

As an additional precaution, I also set smtp_tls_policy_maps to change the default TLS policy from "may" to "encrypt" for smtp-relay.gmail.com.

Gmail configuration

When I tried to send mail through the relay, Postfix wasn't able to authenticate:

SASL authentication failure: No worthy mechs found

SASL authentication failed; cannot authenticate to server smtp-relay.gmail.com[74.125.206.28]: no mechanism available

Google considers password authentication to be “less secure”, and you have to explicitly enable it on the less secure apps settings page. There are some other alternatives, but I was happy to take the path of least resistance here.

I did that and tried again, only for mail to bounce with this error:

Invalid credentials for relay [136.243.148.74]. The IP address you've registered in your G Suite SMTP Relay service doesn't match domain of the account this email is being sent from. If you are trying to relay mail from a domain that isn't registered under your G Suite account or has empty envelope-from, you must configure your mail server either to use SMTP AUTH to identify the sending domain or to present one of your domain names in the HELO or EHLO command. For more information, please visit https://support.google.com/a/answer/6140680#invalidcred

This message is misleading, as I found out by using openssl's s_client to establish a TLS session and then authenticating by hand. SMTP AUTH succeeded, but MAIL FROM was subsequently rejected. I followed the link in the message, which led me to the SMTP relay service support page.

The Google Apps admin console doesn't use sensible URLs, but I followed the breadcrumb trail to an “Advanced settings” page where I was able to edit the SMTP relay service settings to set “Allowed senders” to “Only addresses in my domains”, as well as to “Require SMTP authentication” and “Require TLS encryption”. Remember to “Save” the changes.

The error I got was because the “Only accept mail from specified IP addresses” option was checked for this particular domain. I could have added the IP address of my server to the list, but SMTP authentication was what was I wanted to use anyway.

Who invented the slicing-by-N CRC32 algorithm?

One of my contributions to Postgres 9.5 (back in 2015) was a two-stage optimisation of the CRC computation code. First, switching to a faster algorithm; and second, to use the Intel SSE4.2 CRC instructions where available. I was delighted to have the opportunity to implement such a dramatic performance improvement (CRC computation used to be at the top of the profile on every streaming replica by some distance).

Optimising something by writing assembly (even if it was only a couple of instructions, later replaced by compiler intrinsics) is always fun, but here the algorithm change was also a substantial improvement, in that it used a lookup table to process eight input bytes at a time. This technique is known as “slicing-by-N” (where N depends on the size of the lookup table), and was originally described here:

Frank L. Berry, Michael E. Kounavis, "Novel Table Lookup-Based Algorithms for High-Performance CRC Generation", IEEE Transactions on Computers, vol. 57, no. , pp. 1550-1560, November 2008, doi:10.1109/TC.2008.85

This paper, having been published in a prestigious IEEE journal, is of course not easily available for download (not when I looked in 2015, and apparently not today). I was able to find what I needed to implement the technique thanks to other reference materials, notably including Stephan Brumme's Fast CRC32 page (now considerably expanded since 2015), but I never actually got to read what Messrs. Kounavis and Berry had to say about their technique.

Recently, I had occasion to look at CRC32 implementations again, and I found a different paper that I had looked at briefly the last time: Cyclic Redundancy Check Generation Using Multiple Lookup Table Algorithms by Indu I. and Manu T.S. from TKM Institute of Technology, Kollam, in Kerala (my mother's home state in South India). I remember noting that there was something odd about the paper, but not having time enough to give it more than a passing glance. This time, I spent a while reading it, and it's certainly very odd.

ABSTRACT: The primary goal of this paper is to generate cyclic redundancy check (CRC) using multiple lookup table algorithms. A compact architecture of CRC algorithm (Slicing-by-N algorithm) based on multiple lookup tables (LUT) approach is proposed. This algorithm can ideally read large amounts of data at a time, while optimizing their memory requirement to meet the constraints of specific computer architectures. The focus of this paper is the comparison of two algorithms. These two algorithms are Slicing by-N-algorithm and Sarwate algorithm, in which slicing by-N-algorithm can read arbitrarily 512 bits at a time, but Sarwate algorithm, which can read only 8 bits at a time. This paper proposes the generation of CRC using slicing by 8 algorithm. In this, message bits are chunked to 8 blocks. All are processed at a time. Proposed Slicing-by-8 algorithm can read 64 bits of input data at a time and it doubles the performance of existing implementations of Sarwate algorithm.

Is this paper claiming to have invented the slicing-by-N algorithm? It's hard to tell from the blather in the abstract, but going through the remaining blather (and effort that, in retrospect, I cannot in good conscience commend to the reader) suggests that this is indeed the case.

Recently time is the major concern. So in order to process large amount of data at a time, Multiple Lookup based approach is more efficient. Multiple Lookup based approach contains five CRC algorithms, called Slicing by-N algorithm (N ϵ 4, 8, 16, 32, 64), which is used to read up to 512 bits at a time. So performance of the system should be increased. Here proposing Slicing by-8 algorithm to read 64 bits at a time. Here proposed an efficient design of CRC generator using Slicing by-N algorithm (N=8). In this algorithm, input message stream is sliced into N slices and each slice has 8 bits. So using this Slicing by-8 algorithm, it can read 64 bits at a time and it triples the performance of existing implementation of Sarwate algorithm.

Oho, so it triples the performance of existing implementations of the Sarwate algorithm, does it? Funny the abstract claims a paltry doubling in performance then. The paper goes on to describe CRC computation with block diagrams, and then has some more blather about VHDL and MATLAB and some screenshots of “simulation waveforms”, all of which seems to amount to showing that the various CRC algorithms produce the same results and that processing more input bytes at a time is faster than not doing so. I made judicious use of the fast-forward button to reach the conclusion, which begins with

The design of CRC generator using Multiple Look Up based approach is proposed. In this paper, slicing by-8 algorithm is designed, and compares this algorithm with the existing algorithms, that is, with Sarwate algorithm and LFSR method.

So yeah, they're claiming in a slightly roundabout way to have invented the slicing-by-8 CRC algorithm. However, the authors cite the Kounavis and Berry paper anyway, perhaps so that any criticism can be blamed on some sort of unfortunate misunderstanding. I didn't find any citations of this paper in the minute or two I spent on the search, but Citeseer and Researchgate link to it, and it's quite prominent in search results, so it's probably only a matter of time before someone cites it.

The paper was published in "International Journal of Modern Engineering Research” (ijmer.com) in 2012; the journal's name alone reminded me of the scammers, Academic Journals Online, whom I encountered a few years ago. IJMER does not, however, appear to be one of the AJO journals. Perhaps it's a second cousin.

Unfortunately, the authors include no contact information in the paper, so I was not able to send them a link to this page.