We had unexpectedly heavy snowfall the other day and, as always, the
mains power supply came back only after a few days of repairing broken
power lines in the forest. Meanwhile, the days were so overcast that the
solar inverter couldn't charge the batteries enough to keep up with our
minimal domestic load.
Which meant that when the sun came out again, I was left staring at
something like this for a long time:
root@soot:~# ps -eo pid,cmd|grep '[f]sck'
756 /lib/systemd/systemd-fsck /dev/mapper/sdb1_crypt
757 /sbin/fsck -a -T -l -M -C4 /dev/mapper/sdb1_crypt
758 /lib/systemd/systemd-fsckd
759 fsck.ext4 -a -C4 /dev/mapper/sdb1_crypt
Long enough, in fact, that I began to wonder if I could tell what it was
doing. (The volume in question is exported via iSCSI from a Synology NAS
and fsck is still running long after the machine has otherwise finished
booting up, so I have ordinary shell access.)
I could see (from the parent pid, not shown above) that systemd-fsck had
started fsck, which had started fsck.ext4 in turn. I would have guessed
that systemd-fsck was started by systemd-fsckd, but both processes were
children of init by the time I looked.
Here's an excerpt of what strace had to say about fsck.ext4:
pread64(3, "\0\0\0\0"..., 32768, 2403706052608) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706085376) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403706118144) = 32768
pread64(3, "\0\0\0\0"..., 16384, 2403706150912) = 16384
fadvise64(3, 2403973734400, 868352, POSIX_FADV_WILLNEED) = 0
write(4, "1 17910 59104 /dev/mapper/sdb1_c"..., 37) = 37
pread64(3, "\0\0\0\0"..., 32768, 2403839516672) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839549440) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839582208) = 32768
pread64(3, "\0\0\0\0"..., 32768, 2403839614976) = 32768
So it was reading from (presumably) the block device in 32KB chunks and
telling the kernel's readahead mechanism that it meant to keep going
past the 2TB or so it had reached. But what's that write in the middle?
$ strace -etrace=write -s 50 -p 759
write(4, "1 19120 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19121 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 19122 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
…
Well, that looked promising. I could just wait for the counter (19120)
to catch up to its destination (59104) and fsck would be done!
write(4, "1 59102 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59103 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "1 59104 59104 /dev/mapper/sdb1_crypt\n", 37) = 37
write(4, "2 0 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 1 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
Oh, wait, fsck operates in multiple passes, doesn't it?
I ran apt-get source to download a copy of e2fsprogs (the package that
installs /sbin/fsck.ext4, as identified by 'dpkg -S'), and a wild guess
that the output above was generated with a printf of some kind led me to
a function named e2fsck_update_progress.
$ rg -A1 -tc '%d %.*%.*%s\\n'
e2fsck/unix.c
594: snprintf(buf, sizeof(buf), "%d %lu %lu %s\n",
595- pass, cur, max, ctx->device_name);
A few lines below is a call to a calc_percent function.
struct percent_tbl {
int max_pass;
int table[32];
};
static struct percent_tbl e2fsck_tbl = {
5, { 0, 70, 90, 92, 95, 100 }
};
static float calc_percent(struct percent_tbl *tbl,
int pass, int curr, int max)
{
float percent;
if (pass <= 0)
return 0.0;
if (pass > tbl->max_pass || max == 0)
return 100.0;
percent = ((float) curr) / ((float) max);
return ((percent * (tbl->table[pass]-tbl->table[pass-1]))
+ tbl->table[pass-1]);
}
We can compute the percentage by substituting the numbers from the last
line of strace output above.
write(4, "2 2 158345 /dev/mapper/sdb1_crypt\n", 34) = 34
We're neither at 0 nor a 100%, so the return value is 2/158345*(90-70),
which is close to 0, plus 70; which leaves us at a smidgen over 70% at
the start of pass 2, approaching 90% as curr approaches max (158345).
Depending on how much faith you have in the "pass x takes roughly y%"
e2fsck_tbl defined above, this may or may not be comforting.
That left me with one more question: why was "max" changing between
passes? The answer is that it represents different values during the
different passes, as seen below (results edited slightly for clarity).
e2fsck/e2fsck.c
255: (ctx->progress)(ctx, 0, 0, 0);
e2fsck/pass1.c
2108: (ctx->progress)(ctx, 1, group+1,
ctx->fs->group_desc_count)
e2fsck/pass2.c
1008: (ctx->progress)(ctx, 2, cd->count++, cd->max)
e2fsck/pass3.c
106: (ctx->progress)(ctx, 3, count++, maxdirs)
e2fsck/pass4.c
191: (ctx->progress)(ctx, 4, group, maxgroup)
e2fsck/pass5.c
496: (ctx->progress)(ctx, 5, group,
fs->group_desc_count*2)
If only I had thought to look at the fsck manpage at the time, I might
have noticed what the -C4 option was meant to do:
-C [fd]
Display completion/progress bars for those
filesystem checkers (currently only for ext[234])
which support them. …
Given enough time—and of time, there was no shortage—I may even have
made the mental leap to checking if systemctl status might show me the
progress of fsck in some easily comprehensible form.
I wonder if it will snow again this year.
Update 2020-03-16: Mid-March brought a surprise spell of
battery-draining weather (heavy rain, hail, snow a bit further North,
and three overcast days without mains power) and another opportunity to
find out what systemd-fsck does with fsck progress information.
I had naïvely hoped that "systemctl status" would show this information,
but it did not. Nor did the
systemd-fsck(8) man page
say anything about it. So I had a quick glance at the
systemd-fsck source code
and found this in the process_progress function:
p = percent(pass, cur, max);
fprintf(console, "\r%s: fsck %3.1f%% complete...\r%n", …);
fflush(console);
systemd-fsck calls the process_progress function to read progress output
from fsck and write a summary to the console if /run/systemd/show-status
exists. (The percent function is a copy of e2fsck's calc_percent above.)
arg_show_progress =
access("/run/systemd/show-status", F_OK) >= 0;
…
console = fopen("/dev/console", "we");
if (console &&
arg_show_progress &&
pipe(progress_pipe) < 0)
return log_error_errno(errno, "pipe(): %m");
…
if (console)
(void) process_progress(progress_pipe[0], console);
Searching for show-status led me to the systemd.show-status boot
parameter and the equivalent ShowStatus setting in systemd.conf:
Takes a boolean argument or the constant auto. If yes, the
systemd manager (PID 1) shows terse service status updates
on the console during bootup …
Setting this parameter controls the presence of this trigger
file:
void manager_set_show_status(Manager *m, ShowStatus mode, …)
{
…
bool enabled = IN_SET(mode, SHOW_STATUS_YES, …);
…
if (enabled)
(void) touch("/run/systemd/show-status");
else
(void) unlink("/run/systemd/show-status");
}
Nice to know, but not helpful for an already-running fsck: systemd-fsck
checks for the file only at startup to set up the progress pipeline and
call process_progress.
Still, it's interesting to look at an excerpt from the code to construct
the -Cn option to tell fsck where to write progress information.
r = safe_fork("(fsck)", …);
if (r == 0) {
int progress_socket = -1;
…
/* Child */
/*
* Try to connect to a progress management daemon,
* if there is one
*/
progress_socket = fsck_progress_socket();
if (progress_socket >= 0) {
/*
* If this worked we close the progress pipe
* early, and just use the socket
*/
progress_pipe[1] = safe_close(progress_pipe[1]);
xsprintf(dash_c, "-C%i", progress_socket);
} else if (progress_pipe[1] >= 0) {
/*
* Otherwise if we have the progress pipe to our
* own local handle, we use it
*/
xsprintf(dash_c, "-C%i", progress_pipe[1]);
}
A “progress management daemon”! That sounds promising. The
fsck_progress_socket function tries to connect to an AF_UNIX socket
named /run/systemd/fsck.progress, but searching for fsck.progress in the
systemd git history led me to
this commit
from 2015:
fsck: remove fsckd again, but keep the door open for external
replacement
This introduces /run/systemd/fsck.progress as a simply
AF_UNIX/SOCK_STREAM socket. If it exists and is connectable we'll
connect fsck's -c switch with it. If external programs want to get
progress data they should hence listen on this socket and will get all
they need via that socket. To get information about the connecting fsck
client they should use SO_PEERCRED.
Unless /run/systemd/fsck.progress is around and connectable this change
reverts back to v219 behaviour where we'd forward fsck output to
/dev/console on our own.
The linked post
explains that systemd-fsckd (apparently an Ubuntu innovation) was so
broken that it had no place even in systemd. It was replaced with the
fsck.progress socket so that it could continue to exist and work the
same as before, just out of sight of systemd itself.
But
systemd-fsckd
is still running on my systemd, so what exactly does it do?
systemd-fsckd.service is a service responsible for receiving file system
check progress, and communicating some consolidated data to console and
plymouth (if running). It also handles possible check cancellations.
I didn't have any easy way to go back and read the console logs from
this machine. So what's
Plymouth, then?
Plymouth is an application that runs very early in the boot process
(even before the root filesystem is mounted!) that provides a graphical
boot animation while the boot process happens in the background.
This machine doesn't have a monitor connected, nor Plymouth installed,
so I'm missing out on the graphical animations, not to mention that the
boot process had already finished long ago.
In short, systemd-fsckd and systemd-fsck didn't do anything especially
convenient for me with the progress information that it requests from
fsck, and I didn't miss out on anything by reaching for strace first.