The Advisory Boar

By Abhijit Menon-Sen <>

Who invented the slicing-by-N CRC32 algorithm?

One of my contributions to Postgres 9.5 (back in 2015) was a two-stage optimisation of the CRC computation code. First, switching to a faster algorithm; and second, to use the Intel SSE4.2 CRC instructions where available. I was delighted to have the opportunity to implement such a dramatic performance improvement (CRC computation used to be at the top of the profile on every streaming replica by some distance).

Optimising something by writing assembly (even if it was only a couple of instructions, later replaced by compiler intrinsics) is always fun, but here the algorithm change was also a substantial improvement, in that it used a lookup table to process eight input bytes at a time. This technique is known as “slicing-by-N” (where N depends on the size of the lookup table), and was originally described here:

Frank L. Berry, Michael E. Kounavis, "Novel Table Lookup-Based Algorithms for High-Performance CRC Generation", IEEE Transactions on Computers, vol. 57, no. , pp. 1550-1560, November 2008, doi:10.1109/TC.2008.85

This paper, having been published in a prestigious IEEE journal, is of course not easily available for download (not when I looked in 2015, and apparently not today). I was able to find what I needed to implement the technique thanks to other reference materials, notably including Stephan Brumme's Fast CRC32 page (now considerably expanded since 2015), but I never actually got to read what Messrs. Kounavis and Berry had to say about their technique.

Recently, I had occasion to look at CRC32 implementations again, and I found a different paper that I had looked at briefly the last time: Cyclic Redundancy Check Generation Using Multiple Lookup Table Algorithms by Indu I. and Manu T.S. from TKM Institute of Technology, Kollam, in Kerala (my mother's home state in South India). I remember noting that there was something odd about the paper, but not having time enough to give it more than a passing glance. This time, I spent a while reading it, and it's certainly very odd.

ABSTRACT: The primary goal of this paper is to generate cyclic redundancy check (CRC) using multiple lookup table algorithms. A compact architecture of CRC algorithm (Slicing-by-N algorithm) based on multiple lookup tables (LUT) approach is proposed. This algorithm can ideally read large amounts of data at a time, while optimizing their memory requirement to meet the constraints of specific computer architectures. The focus of this paper is the comparison of two algorithms. These two algorithms are Slicing by-N-algorithm and Sarwate algorithm, in which slicing by-N-algorithm can read arbitrarily 512 bits at a time, but Sarwate algorithm, which can read only 8 bits at a time. This paper proposes the generation of CRC using slicing by 8 algorithm. In this, message bits are chunked to 8 blocks. All are processed at a time. Proposed Slicing-by-8 algorithm can read 64 bits of input data at a time and it doubles the performance of existing implementations of Sarwate algorithm.

Is this paper claiming to have invented the slicing-by-N algorithm? It's hard to tell from the blather in the abstract, but going through the remaining blather (and effort that, in retrospect, I cannot in good conscience commend to the reader) suggests that this is indeed the case.

Recently time is the major concern. So in order to process large amount of data at a time, Multiple Lookup based approach is more efficient. Multiple Lookup based approach contains five CRC algorithms, called Slicing by-N algorithm (N ϵ 4, 8, 16, 32, 64), which is used to read up to 512 bits at a time. So performance of the system should be increased. Here proposing Slicing by-8 algorithm to read 64 bits at a time. Here proposed an efficient design of CRC generator using Slicing by-N algorithm (N=8). In this algorithm, input message stream is sliced into N slices and each slice has 8 bits. So using this Slicing by-8 algorithm, it can read 64 bits at a time and it triples the performance of existing implementation of Sarwate algorithm.

Oho, so it triples the performance of existing implementations of the Sarwate algorithm, does it? Funny the abstract claims a paltry doubling in performance then. The paper goes on to describe CRC computation with block diagrams, and then has some more blather about VHDL and MATLAB and some screenshots of “simulation waveforms”, all of which seems to amount to showing that the various CRC algorithms produce the same results and that processing more input bytes at a time is faster than not doing so. I made judicious use of the fast-forward button to reach the conclusion, which begins with

The design of CRC generator using Multiple Look Up based approach is proposed. In this paper, slicing by-8 algorithm is designed, and compares this algorithm with the existing algorithms, that is, with Sarwate algorithm and LFSR method.

So yeah, they're claiming in a slightly roundabout way to have invented the slicing-by-8 CRC algorithm. However, the authors cite the Kounavis and Berry paper anyway, perhaps so that any criticism can be blamed on some sort of unfortunate misunderstanding. I didn't find any citations of this paper in the minute or two I spent on the search, but Citeseer and Researchgate link to it, and it's quite prominent in search results, so it's probably only a matter of time before someone cites it.

The paper was published in "International Journal of Modern Engineering Research” (ijmer.com) in 2012; the journal's name alone reminded me of the scammers, Academic Journals Online, whom I encountered a few years ago. IJMER does not, however, appear to be one of the AJO journals. Perhaps it's a second cousin.

Unfortunately, the authors include no contact information in the paper, so I was not able to send them a link to this page.

heaping ICMP echo requests on unsuspecting hosts

My friend Nicolai Langfeldt (author of the DNS HOWTO) works at a large ISP, where he recently needed to ping somewhere between 10,000 and 100,000 hosts every ten seconds and feed the up/down data to Nagios. He had started hacking on fping to do this, but I thought it could be done better by a smaller program written specifically for this purpose.

I wrote most of github.com/amenonsen/heaping on a flight home. It takes a list of IP addresses on the command line, writes ICMP echo requests for each address to a raw socket every ten seconds, and creates another process to print the response time (or "unreachable") for each response. It does nothing else, and does not need to allocate any memory after startup.

Nicolai's preliminary tests show that it can send out the 10k pings in under 100ms, which far exceeds the performance I was hoping for (though his networking colleagues may yet frown upon sending that many pings so fast). With that kind of performance, monitoring 100k hosts isn't out of the question, even if the sender has to be slowed down a bit.

The code is on github in case anyone else finds it useful. Feedback is welcome.

ping.c

While looking for something that could be adapted for this use, I found the source code of the original ping.c on the ISOC web site. I decided against modifying it because of several misleading comments, type abuse (e.g. allocating a struct sockaddr and casting its address to struct sockaddr_in *), and other K&R/BSD oddities. But it's good code overall, and it was instructive to look through it.

Introducing Snippets

Since Arnt already announced snippets on rant.gulbrandsen last month, I thought I'd write a quick introduction to it here as well.

Snippets is a Jabber bot that periodically asks its subscribers what they're doing, and provides access to the answers through a simple web interface. It is meant primarily to help teams of developers to communicate with each other.

Snippets is a Perl program that depends on a few CPAN modules, and needs access to a Postgres database. You can get the source from the github page, and installation instructions are given in the README.

Feedback and feature requests are welcome. Please send them to ams@toroid.org.

Source code comments vs. unit tests

I've read many arguments about source code comments, from "comments are good, they help you understand code" to "comments are bad, they become obsolete and mislead you" and everything in between. I've seen enough good comments and bad comments to think that trying to draw a line in the sand is a waste of time. I write comments when I think they might be useful, curse them when they aren't, and find something else to occupy my attention the rest of the time.

Today, I happened across a blog post entitled Don't Waste Your Time Commenting Source Code. It's a fairly typical rehash of the usual arguments, written with much conviction and authority, but thankfully without the traditional "// Increment i" example. A couple of sentences caught my eye:

Contra: Automated unit tests explain what the program does just as well, and they never become obsolete.

I appreciate the argument that comments are unaffected by unintended code changes, while (sufficiently detailed) tests break immediately, but I have yet to see a good test suite that explained how a program worked just as well as good documentation. If the tests are a clear and straightforward explanation of how the program should work (which is not their function), they're not good enough at testing edge cases where the program breaks (which is their function); conversely, a thorough test suite has too much low-level detail to serve as a cogent exposition of the code's functioning. Not for a library, and certainly not for an application.

Besides, the "tests will break" argument is too simplistic. Test suites are also ultimately the product of fallible programmers. I've made many changes that were wrong but broke nothing. I've seen tests that blessed the entirely wrong behaviour. I've seen and made many changes that were not accompanied by enough tests, and others that came with lots of tests but were broken anyway. Every non-trivial test suite I've seen has been a complex maze of "historical reasons" and tests for long-forgotten (and always poorly-described) bugs rather than a clear explanation of anything at all.

Frankly, I would rather have developers spend time polishing their design skills than technical writing skills.

The way I see it, good design and good explanations go hand in hand. Neglecting one does no favours to the other.

Life with cpanminus is different

I recently learned about Tatsuhiko Miyagawa's cpanminus, a zero-configuration CPAN module installation program that itself needs nothing more than "wget -O bin/cpanminus http://cpanmin.us" to install.

Once I started using it ("cpanm Module::Name"), I realised that I had underestimated the extent to which CPAN.pm and friends had encouraged my reluctance to install CPAN modules (and deal with dependency hell). I haven't changed my mind, but using cpanminus is pleasant enough that I can afford to change some of my habits.

I also look forward to having a clean(er) conscience when recommending the installation of modules I want to use in my programs that are meant for other people to use (recent examples: AnyEvent and AnyEvent::XMPP).

Mirroring GMail mailboxes with IMAP

Hassath wanted to download a couple of gigabytes worth of mail from her GMail account for offline storage, but was frustrated by the IMAP server closing the connection during large transfers, which made Thunderbird go into a sulk, and forced her to restart the download by hand every time. Using POP was not an option, since it had no way to preserve the labels she had assigned to the messages (which show up as mailboxes via IMAP).

I tried some other programs, but they didn't work the way I wanted. Some of them provided little or no progress information. Some would fetch all of the messages into memory before writing anything to disk. Some would fetch messages one by one, with separate command-response cycles. Most importantly, none of them coped well with the frequent disconnections.

Having exhausted my patience, and being in need of a weekend hack, I set out to write a program to download the mail. I tried the IMAP modules on CPAN, but found them lacking in one way or another. IMAP::Client parses IMAP responses in a fundamentally broken way (scanning literal data for things which look like OK responses). Net::IMAP::Client doesn't provide debugging information or allow MSN fetches. Net::IMAP doesn't know about IMAP+SSL. Most importantly, none of the modules would allow me to stream messages to disk sensibly. So I wrote one from scratch.

Read more…

Introducing Loathsxome: minimalist weblog software

I started using Blosxom a few weeks ago. I liked the basic idea (that each post is one file) very much, but I found the code hard to read and poorly documented. Plugins varied widely in quality, and I had to jump through hoops to implement some of the things I wanted. By the time I got everything working the way I liked it, I ended up with a complete rewrite.

The code is well-documented, supports tagging and pagination, generates an RSS feed, works nicely with Git, and lets me publish URLs like http://toroid.org/etc/200908 and /etc/birds+language. (Disadvantages: it isn't quite plugin-compatible with Blosxom, and it doesn't support static generation.)

I'm sure I could have used a more modern, featureful program to achieve the same effect, but I'm glad I didn't have to. I was happier to spend a couple of hours writing something worse than a few days trying to fit something better, like Wordpress, into my head.

I wrote this code for my own use with no intention of releasing it, but a few people have expressed an interest in using it, so here it is.

No longer maintained

Update (November 2013): Loathsxome never had more than a handful of users, and only two who regularly contributed code (Arnt Gulbrandsen and myself). Arnt wrote plusxome, and I wrote something else using Mojolicious without even pretending to maintain backwards compatibility.

Loathsxome is still here. It works as well as it ever did, and none of what's written below is untrue, but nobody uses it any more.

Read more…

Inspector34

Inspector34 is a transparent web proxy that records requests and responses for later playback and comparison. It is meant to help with regression testing.

It consists of three programs: i34-record is the proxy server, which keeps a journal of HTTP requests and responses; and i34-replay recreates the original requests from this journal. i34-diff compares responses from the original and replayed sessions.

Inspector34 is written in Perl, and is distributed under a BSD-style open source license. Feedback is welcome.

Read more…

pemtrans: Convert OpenSSL RSA private keys to Cryptlib keysets

OpenSSL stores private keys in an undocumented PEM format (the key data is DER-encoded and the result is ASCII-armoured), which Cryptlib does not support.

pemtrans reads an OpenSSL RSA private key and the corresponding signed public key certificate, and writes a PKCS #15 keyset that Cryptlib can use. The included manual page explains how to use the program.

Key usage

Cryptlib requires (and respects) the KEYUSAGE attribute on certificates. Some certificates do not contain this attribute, and pemtrans issues a warning when it encounters them, because Cryptlib will probably reject it when you try to use it for something, e.g. in a TLS server.

OpenSSL, by default, does not set KEYUSAGE. This can be fixed by adding a line like the following to openssl.cnf before generating the key and certificate (the attributes specified here are enough for the result to be used in a TLS server):

keyUsage = cRLSign, digitalSignature, keyEncipherment, keyCertSign

Download

Download pemtrans.tar.gz (from github.com/amenonsen/pemtrans)

Use, modification, and distribution of pemtrans is allowed without any limitations. There is no warranty, express or implied.

Please send questions and comments to ams@toroid.org.

Better time management through profiling

This web page is about the most potent charm in my long-standing quest for better time management: a program to help me keep track of exactly where and how my time is spent.

Of course, I can't tell you how best to manage your own time, but just as profiling is the best guide to code optimisation, I found it useful (and sometimes very surprising) to begin by knowing exactly where time goes. I've been very happy with the results, and I know of others who have evolved similar approaches to the problem.

Instrumentation

The idea is that you run a program (ts) every time you change what you are doing. This sounds intrusive, and it does take a while to get used to; but ts goes out of its way to make it easy, and it's almost second nature to me now. I find the benefits worth this extra investment.

To run ts, you will need to have Perl installed along with the DBI and DBD::Pg modules. You'll also need to provide a PostgreSQL database for it to use, and you'll have to create the activities table from ts.sql.

The profiling data is stored in this very simple SQL table:

    create table activities (
        id          serial,
        category    text not NULL,
        comment     text,
        started     timestamp(0) not NULL default current_timestamp,
        stopped     timestamp(0)
                    check (stopped is NULL or stopped > started)
    );

A row in this table describes a period of time spent on some activity. The category is usually something like "work/mail" or "work/src", the latter perhaps with a comment like "Fixing Jamfile". I find it useful to organise my activities into a hierarchy, but you can decide for yourself how you want to categorise things.

When you start a new activity ("ts work/src Fixing bug 42"), its started time is set, and the stopped time of any activity that doesn't have one (any activity that is in progress, in other words) is also set. You can also specify the start and end times explicitly ("ts -s 12:00 -e 12:30 lunch").

My .zshrc contains "compctl -/ -W ~/.categories ts", and I've created directories that reflect my categorisation under ~/.categories. Now I can do things like "ts w<TAB>e<TAB>" instead of having to type the category name ("work/email") in full.

Running ts without arguments explains how to use it.

Visualisation

Once the data is available, it's up to you to figure out how to use it to your advantage ("How much time did I spend on the phone last week?" "Let me generate a time sheet for work/some-client/code/" ...). Just playing with the data in psql is often instructive.

The tsc program generates sc spreadsheets from the profiling data. It presents, for a given time period (the last week by default), a list of the categories matching some pattern (everything by default), and a table (with totals) of how much time has been spent on them on each of the days in the period under consideration. (You need a patched version of sc to use tsc.)

I plan to write something that generates nicely-formatted time sheets from this data, but I haven't gotten around to it yet. If you have an interesting visualisation idea, please let me know.

Download

Here's a tarball of the source code. It contains the ts and tsc programs, the SQL table definition, a patch required for sc to understand the tsc output, and some documentation. (You can also download a patched version of sc.)

Please write to ams@toroid.org if you have any questions, comments, or suggestions. If this program is useful to you in some way, I'd love to hear about it.