What is fsck up to now?

By Abhijit Menon-Sen <ams@toroid.org>

2020-02-02

We had unexpectedly heavy snowfall the other day and, as always, the mains power supply came back only after a few days of repairing broken power lines in the forest. Meanwhile, the days were so overcast that our solar inverter eventually drained the batteries.

Which meant that when the sun came out again, I was left staring at something like this for a long time:

root@soot:~# ps -eo pid,cmd|grep '[f]sck'
756 /lib/systemd/systemd-fsck /dev/mapper/sdb1_crypt
757 /sbin/fsck -a -T -l -M -C4 /dev/mapper/sdb1_crypt
758 /lib/systemd/systemd-fsckd
759 fsck.ext4 -a -C4 /dev/mapper/sdb1_crypt

Long enough, in fact, that I began to wonder if I could tell what it was doing.

I could see (from the parent pid, not shown above) that systemd-fsck had started fsck, which had started fsck.ext4 in turn. I would have guessed that systemd-fsck was started by systemd-fsckd, but both processes were children of init by the time I looked.

Here's an excerpt of what strace had to say about fsck.ext4:

pread64(3, "\0\0\0\0"..., 32768, 2403706052608) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706085376) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706118144) = 32768
pread64(3, "\0\0\0\0"..., 16384, 2403706150912) = 16384
fadvise64(3, 2403973734400, 868352, POSIX_FADV_WILLNEED) = 0
write(4, "1 17910 59104 /dev/mapper/sdb1_c"..., 37) = 37
pread64(3, "\0\0\0\0"..., 32768, 2403839516672) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839549440) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839582208) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839614976) = 32768

So it was reading from (presumably) the block device in 32KB chunks and telling the kernel's readahead mechanism that it meant to keep going past the 2TB or so it had reached. But what's that write in the middle?

$ strace -etrace=write -s 50 -p 759
write(4, "1 19120 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19121 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19122 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
…

Well, that looked promising. I could just wait for the counter (19120) to catch up to its destination (59104) and fsck would be done!

write(4, "1 59102 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59103 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59104 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "2 0 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 1 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34

Oh, that's right, fsck operates in multiple passes, doesn't it?

I ran apt-get source to download a copy of e2fsprogs (the package that installs /sbin/fsck.ext4, as identified by 'dpkg -S'), and a wild guess that the output above was generated with a printf of some kind led me to a function named e2fsck_update_progress.

$ rg -A1 -tc '%d %.*%.*%s\\n'
e2fsck/unix.c
594: snprintf(buf, sizeof(buf), "%d %lu %lu %s\n",
595-          pass, cur, max, ctx->device_name);

A few lines below is a call to a calc_percent function.

struct percent_tbl {
    int max_pass;
    int table[32];
};
static struct percent_tbl e2fsck_tbl = {
    5, { 0, 70, 90, 92,  95, 100 }
};

static float calc_percent(struct percent_tbl *tbl,
                          int pass, int curr, int max)
{
    float percent;

    if (pass <= 0)
        return 0.0;
    if (pass > tbl->max_pass || max == 0)
        return 100.0;
    percent = ((float) curr) / ((float) max);
    return ((percent * (tbl->table[pass]-tbl->table[pass-1]))
        + tbl->table[pass-1]);
}

We can compute the percentage by substituting the numbers from the last line of strace output above.

write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34

We're neither at 0 nor a 100%, so the return value is 2/158345*(90-70), which is close to 0, plus 70; which leaves us at a smidgen over 70% at the start of pass 2, approaching 90% as curr approaches max (158345). Depending on how much faith you have in the "pass x takes roughly y%" e2fsck_tbl defined above, this may or may not be comforting.

That left me with one more question: why was "max" changing between passes? The answer is that it represents different values during the different passes, as seen below (results edited slightly for clarity).

e2fsck/e2fsck.c
255:  (ctx->progress)(ctx, 0, 0, 0);

e2fsck/pass1.c
2108: (ctx->progress)(ctx, 1, group+1,
                      ctx->fs->group_desc_count)

e2fsck/pass2.c
1008: (ctx->progress)(ctx, 2, cd->count++, cd->max)

e2fsck/pass3.c
106:  (ctx->progress)(ctx, 3, count++, maxdirs)

e2fsck/pass4.c
191:  (ctx->progress)(ctx, 4, group, maxgroup)

e2fsck/pass5.c
496:  (ctx->progress)(ctx, 5, group,
                      fs->group_desc_count*2)

If only I had thought to look at the fsck manpage at the time, I might have noticed what the -C4 option was meant to do:

-C [fd]
    Display completion/progress bars for those
    filesystem checkers (currently only for ext[234])
    which support them. …

Given enough time—and of time, there was no shortage—I may even have made the mental leap to checking if systemctl status might show me the progress of fsck in some easily comprehensible form.

I wonder if it will snow again this year.