What is fsck up to now?

By Abhijit Menon-Sen <>

We had unexpectedly heavy snowfall the other day and, as always, the mains power supply came back only after a few days of repairing broken power lines in the forest. Meanwhile, the days were so overcast that the solar inverter couldn't charge the batteries enough to keep up with our minimal domestic load.

Which meant that when the sun came out again, I was left staring at something like this for a long time:

root@soot:~# ps -eo pid,cmd|grep '[f]sck'
756 /lib/systemd/systemd-fsck /dev/mapper/sdb1_crypt
757 /sbin/fsck -a -T -l -M -C4 /dev/mapper/sdb1_crypt
758 /lib/systemd/systemd-fsckd
759 fsck.ext4 -a -C4 /dev/mapper/sdb1_crypt

Long enough, in fact, that I began to wonder if I could tell what it was doing. (The volume in question is exported via iSCSI from a Synology NAS and fsck is still running long after the machine has otherwise finished booting up, so I have ordinary shell access.)

I could see (from the parent pid, not shown above) that systemd-fsck had started fsck, which had started fsck.ext4 in turn. I would have guessed that systemd-fsck was started by systemd-fsckd, but both processes were children of init by the time I looked.

Here's an excerpt of what strace had to say about fsck.ext4:

pread64(3, "\0\0\0\0"..., 32768, 2403706052608) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706085376) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706118144) = 32768
pread64(3, "\0\0\0\0"..., 16384, 2403706150912) = 16384
fadvise64(3, 2403973734400, 868352, POSIX_FADV_WILLNEED) = 0
write(4, "1 17910 59104 /dev/mapper/sdb1_c"..., 37) = 37
pread64(3, "\0\0\0\0"..., 32768, 2403839516672) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839549440) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839582208) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839614976) = 32768

So it was reading from (presumably) the block device in 32KB chunks and telling the kernel's readahead mechanism that it meant to keep going past the 2TB or so it had reached. But what's that write in the middle?

$ strace -etrace=write -s 50 -p 759
write(4, "1 19120 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19121 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19122 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
…

Well, that looked promising. I could just wait for the counter (19120) to catch up to its destination (59104) and fsck would be done!

write(4, "1 59102 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59103 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59104 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "2 0 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 1 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34

Oh, wait, fsck operates in multiple passes, doesn't it?

I ran apt-get source to download a copy of e2fsprogs (the package that installs /sbin/fsck.ext4, as identified by 'dpkg -S'), and a wild guess that the output above was generated with a printf of some kind led me to a function named e2fsck_update_progress.

$ rg -A1 -tc '%d %.*%.*%s\\n'
e2fsck/unix.c
594: snprintf(buf, sizeof(buf), "%d %lu %lu %s\n",
595-          pass, cur, max, ctx->device_name);

A few lines below is a call to a calc_percent function.

struct percent_tbl {
    int max_pass;
    int table[32];
};
static struct percent_tbl e2fsck_tbl = {
    5, { 0, 70, 90, 92,  95, 100 }
};

static float calc_percent(struct percent_tbl *tbl,
                          int pass, int curr, int max)
{
    float percent;

    if (pass <= 0)
        return 0.0;
    if (pass > tbl->max_pass || max == 0)
        return 100.0;
    percent = ((float) curr) / ((float) max);
    return ((percent * (tbl->table[pass]-tbl->table[pass-1]))
        + tbl->table[pass-1]);
}

We can compute the percentage by substituting the numbers from the last line of strace output above.

write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34

We're neither at 0 nor a 100%, so the return value is 2/158345*(90-70), which is close to 0, plus 70; which leaves us at a smidgen over 70% at the start of pass 2, approaching 90% as curr approaches max (158345). Depending on how much faith you have in the "pass x takes roughly y%" e2fsck_tbl defined above, this may or may not be comforting.

That left me with one more question: why was "max" changing between passes? The answer is that it represents different values during the different passes, as seen below (results edited slightly for clarity).

e2fsck/e2fsck.c
255:  (ctx->progress)(ctx, 0, 0, 0);

e2fsck/pass1.c
2108: (ctx->progress)(ctx, 1, group+1,
                      ctx->fs->group_desc_count)

e2fsck/pass2.c
1008: (ctx->progress)(ctx, 2, cd->count++, cd->max)

e2fsck/pass3.c
106:  (ctx->progress)(ctx, 3, count++, maxdirs)

e2fsck/pass4.c
191:  (ctx->progress)(ctx, 4, group, maxgroup)

e2fsck/pass5.c
496:  (ctx->progress)(ctx, 5, group,
                      fs->group_desc_count*2)

If only I had thought to look at the fsck manpage at the time, I might have noticed what the -C4 option was meant to do:

-C [fd]
    Display completion/progress bars for those
    filesystem checkers (currently only for ext[234])
    which support them. …

Given enough time—and of time, there was no shortage—I may even have made the mental leap to checking if systemctl status might show me the progress of fsck in some easily comprehensible form.

I wonder if it will snow again this year.

Update 2020-03-16: Mid-March brought a surprise spell of battery-draining weather (heavy rain, hail, snow a bit further North, and three overcast days without mains power) and another opportunity to find out what systemd-fsck does with fsck progress information.

I had naïvely hoped that "systemctl status" would show this information, but it did not. Nor did the systemd-fsck(8) man page say anything about it. So I had a quick glance at the systemd-fsck source code and found this in the process_progress function:

p = percent(pass, cur, max);
fprintf(console, "\r%s: fsck %3.1f%% complete...\r%n", …);
fflush(console);

systemd-fsck calls the process_progress function to read progress output from fsck and write a summary to the console if /run/systemd/show-status exists. (The percent function is a copy of e2fsck's calc_percent above.)

arg_show_progress =
    access("/run/systemd/show-status", F_OK) >= 0;
…

console = fopen("/dev/console", "we");
if (console &&
    arg_show_progress &&
    pipe(progress_pipe) < 0)
        return log_error_errno(errno, "pipe(): %m");
…

if (console)
    (void) process_progress(progress_pipe[0], console);

Searching for show-status led me to the systemd.show-status boot parameter and the equivalent ShowStatus setting in systemd.conf:

Takes a boolean argument or the constant auto. If yes, the systemd manager (PID 1) shows terse service status updates on the console during bootup …

Setting this parameter controls the presence of this trigger file:

void manager_set_show_status(Manager *m, ShowStatus mode, …)
{
    …
    bool enabled = IN_SET(mode, SHOW_STATUS_YES, …);
    …

    if (enabled)
        (void) touch("/run/systemd/show-status");
    else
        (void) unlink("/run/systemd/show-status");
}

Nice to know, but not helpful for an already-running fsck: systemd-fsck checks for the file only at startup to set up the progress pipeline and call process_progress.

Still, it's interesting to look at an excerpt from the code to construct the -Cn option to tell fsck where to write progress information.

r = safe_fork("(fsck)", …);
if (r == 0) {
    int progress_socket = -1;
    …

    /* Child */

    /*
     * Try to connect to a progress management daemon,
     * if there is one
     */
    progress_socket = fsck_progress_socket();
    if (progress_socket >= 0) {
            /*
             * If this worked we close the progress pipe
             * early, and just use the socket
             */
            progress_pipe[1] = safe_close(progress_pipe[1]);
            xsprintf(dash_c, "-C%i", progress_socket);
    } else if (progress_pipe[1] >= 0) {
            /*
             * Otherwise if we have the progress pipe to our
             * own local handle, we use it
             */
            xsprintf(dash_c, "-C%i", progress_pipe[1]);
    }

A “progress management daemon”! That sounds promising. The fsck_progress_socket function tries to connect to an AF_UNIX socket named /run/systemd/fsck.progress, but searching for fsck.progress in the systemd git history led me to this commit from 2015:

fsck: remove fsckd again, but keep the door open for external replacement

This introduces /run/systemd/fsck.progress as a simply AF_UNIX/SOCK_STREAM socket. If it exists and is connectable we'll connect fsck's -c switch with it. If external programs want to get progress data they should hence listen on this socket and will get all they need via that socket. To get information about the connecting fsck client they should use SO_PEERCRED.

Unless /run/systemd/fsck.progress is around and connectable this change reverts back to v219 behaviour where we'd forward fsck output to /dev/console on our own.

The linked post explains that systemd-fsckd (apparently an Ubuntu innovation) was so broken that it had no place even in systemd. It was replaced with the fsck.progress socket so that it could continue to exist and work the same as before, just out of sight of systemd itself.

But systemd-fsckd is still running on my systemd, so what exactly does it do?

systemd-fsckd.service is a service responsible for receiving file system check progress, and communicating some consolidated data to console and plymouth (if running). It also handles possible check cancellations.

I didn't have any easy way to go back and read the console logs from this machine. So what's Plymouth, then?

Plymouth is an application that runs very early in the boot process (even before the root filesystem is mounted!) that provides a graphical boot animation while the boot process happens in the background.

This machine doesn't have a monitor connected, nor Plymouth installed, so I'm missing out on the graphical animations, not to mention that the boot process had already finished long ago.

In short, systemd-fsckd and systemd-fsck didn't do anything especially convenient for me with the progress information that it requests from fsck, and I didn't miss out on anything by reaching for strace first.