Too many filesystems

May 17, 2010

While trying to explain something about filesystems the other day, I realised that there are too many different (but related) things that can be reasonably described by that term.

First, there's the general idea of a filesystem, discussed in every operating systems textbook, as an organisation of data into a hierarchy of named directories and files for persistent storage on disk. This is what people mean when they say Store data in the filesystem.

Second, there's the specific protocol that defines the UNIX filesystem, with characteristics such as files being just a series of bytes, having case-sensitive names and certain kinds of metadata, using '/' as a path separator, and supporting various operations (open, read, close, …).

Third, there are the many different filesystem implementations, such as UFS, ext3, XFS—all programs that implement UNIX filesystem semantics but have their own features, characteristics, extensions, and on-disk layout of data. This is the level at which one may decide to use, say, a journalling filesystem for a certain purpose.

Next, the layout of data on a disk, as distinct from whatever program is used to read or write that data, is also called a filesystem. This is the sense in which people might say The filesystem on /dev/sda1 is corrupted—the problem is (one hopes) not with the implementation, but with its instantiation on disk.

Finally, on a UNIX system, the filesystem may refer to the hierarchy of directories rooted at / and built up by mounting specific filesystems (in the "data on disk" sense) at various points on the tree. Thus, it is the union of the contents of its constituent filesystems.

These layers are usually taken for granted, but it is necessary to peel them away one by one to explain things properly.