Chapter 10

Managing File Systems

Introduction

What?

In a previous chapter, we examined the overall structure of the Linux file system. This was a fairly abstract view that didn't explain how the data was physically transferred on and off the disk. Nor in fact, did it really examine the concept of "disks" or even "what" the file system "physically" existed on. 

In this chapter, we shall look at how Linux interacts with physical devices (not just disks), how in particular Linux uses "devices" with respect to its file system and revisit the Linux file system - just at a lower level. 

Why?

Why are you doing this? Doesn't this sound all a bit too like Operating Systems? 

Unless you are content to accept that all low level interaction with the operating system occurs by a mystical form of osmosis and that you will never have to deal with: 

         A Disk crash - an unfortunate physical event involving one of the read/write heads of a hard disk coming into contact with the platter (which is spinning at high speed) causing the removal of the metallic oxide (the substance that maintains magnetic polarity, thus storing data).  This is usually a fatal event for the disk (and sometimes its owner).

         Adding a disk, mouse, modem terminal or a sound card - unlike some unmentionable operating systems, Linux is not "plug-and-pray".  The addition of such a device requires modifications to the system.

         The accidental erasure of certain essential things called "device files"  - while the accidental erasure of any file is a traumatic event, the erasure of a device file calls for special action.

         Installing or upgrading to a kernel or OS release - you may suddenly find that your system doesn't know how to talk to certain things (like your CDROM, your console or maybe your SCSI disk...)  - you will need to find out how to solve these problems.

         Running out of some weird thing called "I-Nodes"  - an event which means you can't create any more files.

... then you will definitely need to read this chapter! 

A scenario

As we progress through this chapter, we will apply the information to help us solve problems associated with a very common System Administrator's task - installing a new hard disk.  Our scenario is this: 

Our current system has a single hard disk and it only has 10% space free (on a good day).  This is causing various problems (which we will discuss during the course of this chapter) - needless to say that it is the user directories (off /home) that are using the most space on the system.  As our IT department is very poor (we work in a university), we have been budgeting for a new hard disk for the past two years - we had bought a new one a year ago but someone drove a forklift over it.  The time has finally arrived - we have a brand new 2.5 gigabyte disk (to complement our existing 500 megabyte one). 

How do we install it?  What issues should we consider when determining its use? 

Devices - Gateways to the kernel

A device is...

A device is just a generic name for any type of physical or logical system component that the operating system has to interact with (or "talk" to). Physical devices include such things as hard disks, serial devices (such as modems, mouse(s) etc.), CDROMs, sound cards and tape-backup drives. 

Logical devices include such things as virtual terminals [every user is allocated a terminal when they log in - this is the point at which output to the screen is sent (STDOUT) and keyboard input is taken (STDIN)], memory, the kernel itself and network ports. 

Device files are...

Device files are special types of "files" that allow programs to interact with devices via the OS kernel. These "files" (they are not actually real files in the sense that they do not contain data) act as gateways or entry points into the kernel or kernel related "device drivers". 

Device drivers are...

Device drivers are coded routines used for interacting with devices. They essentially act as the "go between" for the low level hardware and the kernel/user interface. 

Device drivers may be physically compiled into the kernel (most are) or may be dynamically loaded in memory as required. 


/dev

/dev is the location where most device files are kept. A listing of /dev will output the names of hundreds of files. The following is an edited extract from the MAKEDEV (a Linux program for making device files - we will examine it later) man page on some of the types of device file that exist in /dev

         std
Standard devices.  These include mem - access to physical memory; kmem - access to kernel virtual memory;null  -  null device; port - access to I/O ports;

         Virtual Terminals
This  are  the devices associated with the console.  This is the virtual terminal tty_, where can  be  from  0 though 63. 

         Serial Devices
Serial ports and corresponding dialout device.  For device  ttyS_,  there is also the device cua_ which is used to dial out with. 

         Pseudo Terminals
(Non-Physical terminals) The  master  pseudo-terminals  are pty[p-s][0-9a-f] and the slaves are tty[p-s][0-9a-f].

         Parallel Ports
Standard parallel ports.  The devices  are  lp0,  lp1,  and  lp2.  These correspond to ports at 0x3bc, 0x378 and 0x278.  Hence, on  some  machines, the first printer port may actually be lp1.

         Bus Mice
The various bus mice  devices.   These include: logimouse (Logitech bus mouse), psmouse  (PS/2-style  mouse),  msmouse   (Microsoft Inport  bus  mouse) and atimouse (ATI XL bus mouse) and jmouse (J-mouse).

         Joystick Devices
Joystick.  Devices js0 and js1.

         Disk Devices
Floppy disk devices.  The device fd_ is the  device which  autodetects  the  format, and the additional devices are fixed format (whose size  is  indicated in  the  name).   The  other  devices  are named as fd___.  The single letter _ identifies the type  of floppy  disk  (d = 5.25" DD, h = 5.25" HD, D = 3.5" DD, H = 3.5" HD, E = 3.5" ED).  The number _ represents  the  capacity of that format in K.  Thus the standard formats are  fd_d360_  fd_h1200_  fd_D720_ fd_H1440_ and fd_E2880_

Devices fd0_ through fd3_ are floppy disks  on  the first controller, and devices fd4_ through fd7_ are floppy disks on the second controller.

Hard disks.  The device hdx provides  access  to the   whole   disk,   with   the  partitions  being hdx[0-20].  The four primary  partitions  are  hdx1 through  hdx4,  with  the  logical partitions being numbered from hdx5 though hdx20.  (A primary partition  can be made into an extended partition, which can hold 4 logical partitions).

Drives hda and hdb are the two on  the  first  controller.   If using the new IDE driver (rather than the old HD driver), then hdc and hdd  are  the  two drives  on the secondary controller.  These devices can also be used to access IDE CDROMs if  using  the new IDE driver.

SCSI hard disks.  The partitions are similar to the IDE  disks, but there is a limit of 11 logical partitions (sd_5 through sd_15).   This  is  to  allow there to be 8 SCSI disks.

Loopback  disk  devices.   These allow you to use a regular file as a block device.   This  means  that images  of  file systems can be mounted, and used as normal.   There are  8  devices,  loop0  through loop7.

         Tape Devices
SCSI tapes.  These are the rewinding tape devicest_ and the non-rewinding tape device nst_.

QIC-80 tapes.  The devices are rmt8, rmt16, tape-d, and tape-reset.

Floppy driver tapes (QIC-117).  There are 4 methods of access depending on the floppy tape drive.   For each  of  access methods 0, 1, 2 and 3, the devices rft_ (rewinding) and nrft_ (non-rewinding) are created.  

         CDROM Devices
SCSI CD players.  Sony CDU-31A CD player.  Mitsumi CD player.  Sony CDU-535 CD player.  LMS/Philips CD player.

Sound Blaster CD player.  The kernel is capable  of supporting  16 CDROMs, each of which is accessed as sbpcd[0-9a-f].  These are assigned in groups  of  4 to  each controller. 

         Audio
These are the audio devices used  by  the  sound driver.   These  include mixer, sequencer, dsp, and audio.

Devices for the PC Speaker sound driver.  These are pcmixer.  pxsp, and pcaudio.

         Miscellaneous
Generic  SCSI devices.  The devices created are sg0 through sg7.  These allow arbitrary commands  to  be sent  to any SCSI device.  This allows for querying information about the device, or  controlling  SCSI devices  that  are  not  one of disk, tape or CDROM (e.g. scanner, writable CDROM).

 

While the /dev directory contains the device files for many types of devices, only those devices that have device drivers present in the kernel can be used.  For example, while your system may have a /dev/sbpcd, it doesn't mean that your kernel can support a Sound Blaster CD.  To enable the support, the kernel will have to be recompiled with the Sound Blaster driver included - a process we will examine in a later chapter. 


Physical characteristics of device files

If you were to examine the output of the ls -al command on a device file, you'd see something like: 

psyche:~/sanotes$ ls -al /dev/console
crw--w--w-   1 jamiesob users      4,   0 Mar 31 09:28 /dev/console

In this case, we are examining the device file for the console. There are two major differences in the file listing of a device file from that of a "normal" file, for example: 

psyche:~/sanotes$ ls -al iodev.html
-rw-r--r--   1 jamiesob users7938 Mar 31 12:49 iodev.html 

The first difference is the first character of the "file permissions" grouping - this is actually the file type. On directories this is a "d", on "normal" files it will be blank but on devices it will be "c" or "b". This character indicates c for character mode or b for block mode. This is the way in which the device interacts - either character by character or in blocks of characters. 

For example, devices like the console output (and input) character by character. However, devices like hard disks read and write in blocks. You can see an example of a block device by the following: 

psyche:~/sanotes$ ls -al /dev/had
brw-rw----   1 root     disk       3,   0 Apr 28  1995 /dev/hda

(hda is the first hard drive) 

The second difference is the two numbers where the file size field usually is on a normal file. These two numbers (delimited by a comma) are the major and minor device numbers. 

Major and minor device number s are...

Major and minor device numbers are the way in which the kernel determines which device is being used, therefore what device driver is required. The kernel maintains a list of its available device drivers, given by the major number of a device file. When a device file is used (we will discuss this in the next section), the kernel runs the appropriate device driver, passing it the minor device number. The device driver determines which physical device is being used by the minor device number. For example: 

psyche:~/sanotes$ ls -al /dev/hda
brw-rw----   1 root     disk       3,   0 Apr 28  1995 /dev/hda
psyche:~/sanotes$ ls -al /dev/hdb
brw-rw----   1 root     disk       3,  64 Apr 28  1995 /dev/hdb 

What this listing shows is that a device driver, major number 3, controls both hard drives hda and hdb. When those devices are used, the device driver will know which is which (physically) because hda has a minor device number of 0 and hdb has a minor device number of 64. 


Why use device files?

It may seem using files is a roundabout method of accessing devices - what are the alternatives? 

Other operating systems provide system calls to interact with each device. This means that each program needs to know the exact system call to talk to a particular device. 

With UNIX and device files, this need is removed. With the standard open, read, write, append etc. system calls (provided by the kernel), a program may access any device (transparently) while the kernel determines what type of device it is and which device driver to use to process the call.  [You will remember from Operating Systems that system calls are the services provided by the kernel for programs.]  

Using files also allows the system administrator to set permissions on particular devices and enforce security - we will discuss this in detail later. 

The most obvious advantage of using device files is shown by the way in which as a user, you can interact with them.  For example, instead of writing a special program to play .AU sound files, you can simply: 

psyche:~/sanotes$ cat > /dev/audio 

This command pipes the contents of the test.au file into the audio device.  Two things to note: 1)  This will only work for systems with audio (sound card) support compiled into the kernel (i.e. device drivers exist for the device file) and 2)  this will only work for .AU files - try it with a .WAV and see (actually, listen) what happens.  The reason for this is that .WAV (a Windows audio format) has to be interpreted first before it can be sent to the sound card. 

 

You will not probably need to be the root user to perform the above command as the /dev/audio device has write permissions to all users.  However, don't cat anything to a device unless you know what you are doing - we will discuss why later. 

Creating device files

There are two ways to create device files - the easy way or the hard way! 

The easy way involves using the Linux command MAKEDEV. This is actually a script that can be found in the /dev directory. MAKEDEV accepts a number of parameters (you can check what they are in the man pages. In general, MAKEDEV is run as: 

/dev/MAKEDEV device

where device is the name of a device file. If for example, you accidentally erased or corrupted your console device file (/dev/console) then you'd recreate it by issuing the commend: 

/dev/MAKEDEV console

NOTE! This must be done as the root user 

However, what if your /dev directory had been corrupted and you lost the MAKEDEV script? In this case you'd have to manually use the mknod command. 

With the mknod command you must know the major and minor device number as well as the type of device (character or block). To create a device file using mknod, you issue the command: 

mknod device_file_name device_type major_number minor_number

For example, to create the device file for COM1 a.k.a. /dev/ttys0 (usually where the mouse is connected) you'd issue the command: 

mknod /dev/ttyS0 c 4 240

Ok, so how do you know what type a device file is and what major and minor number it has so you can re-create it? The scouting (or is that the cubs?) solution to every problem in the world, be prepared, comes into play. Being a good system administrator, you'd have a listing of every device file stored in a file kept safely on disk. You'd issue the command: 

ls -al /dev > /mnt/device_file_listing

before you lost your /dev directory in a cataclysmic disaster, so you could read the file and recreate the /dev structure (it might also be smart to copy the MAKEDEV script onto this same disk just to make your life easier :). 

MAKEDEV is only found on Linux systems.  It relies on the fact that the major and minor devices numbers for the system are hard-coded into the script - running MAKEDEV on a non-Linux system won't work because: 

The device names are different

The major and minor numbers of similar devices are different

Note however that similar scripts to MAKEDEV can be found on most modern versions of UNIX. 

The use and abuse of device files

Device files are used directly or indirectly in every application on a Linux system. When a user first logs in, they are assigned a particular device file for their terminal interaction. This file can be determined by issuing the command: 

tty

For example: 

psyche:~/sanotes$ tty
/dev/ttyp1

psyche:~/sanotes$ ls -al /dev/ttyp1
crw-------   1 jamiesob tty4, 193 Apr  2 21:14 /dev/ttyp1  

Notice that as a user, I actually own the device file! This is so I can write to the device file and read from it. When I log out, it will be returned to: 

c---------   1 root     root       4, 193 Apr  2 20:33 /dev/ttyp1      

Try the following: 

read X < /dev/ttyp1 ; echo "I wrote $X"
echo "hello there" > /dev/ttyp1 

You should see something like: 

psyche:~/sanotes$ read X < /dev/ttyp1 ; echo "I wrote $X"
hello
I wrote hello

psyche:~/sanotes$ echo "hello there" > /dev/ttyp1
hello there 

A very important device file is that which is assigned to your hard disk. In my case /dev/hda is my primary hard disk, its device file looks like: 

brw-rw----   1 root     disk       3,   0 Apr 28  1995 /dev/hda  

Note that as a normal user, I can't directly read and write to the hard disk device file - why do you think this is? 

Reading and writing to the hard disk is handled by an intermediary called the file system.  We will examine the role of the file system in later sections, but for the time being, you should be aware that the file system decides how to use the disk, how to find data and where to store information about what is on the disk. 

Bypassing the file system and writing directly to the device file  is a very dangerous thing - device drivers have no concept of file systems, files or even the data that is stored in them; device drivers are only interested in reading and writing chunks of data (called blocks) to physical sectors of the disk.  For example, by directly writing a data file to a device file, you are effectively instructing the device driver to start writing blocks of data onto the disk from where ever the disk head was sitting!  This can (depending on which sector and track the disk was set to) potentially wipe out the entire file structure, boot sector and all the data. Not a good idea to try it. NEVER should you issue a command like: 

cat some_file > /dev/hda1 

As a normal user, you can't do this - but you can as root! 

Reading directly from the device file is also a problem.  While not physically damaging the data on the disk, by allowing users to directly read blocks, it is possible to obtain information about the system that would normally be restricted to them.  For example,  was someone clever enough to obtain a copy of the blocks on the disk where the shadow password file resided (a file normally protected by file permissions so users can view it), they could potentially reconstruct the file and run it through a crack program. 

Exercises

10.1     Use the tty command to find out what device file you are currently logged in from.  In your home directory, create a device file called myterm that has the same major and minor device number.  Log into another session and try redirecting output from a command to myterm.  What happens?

10.2     Use the tty command to find out what device file you are currently logged in on. Try using redirection commands to read and write directly to the device. With another user (or yourself in another session) change the permissions on the device file so that the other user can write to it (and you to theirs). Try reading and writing from each other's device files. 

10.3     Log into two terminals as root. Determine the device file used by one of the sessions, take note of its major and minor device number. Delete the device file - what happens to that session. Log out of the session - now what happens? Recreate the device file. 

Devices, Partitions and File systems

Device files and partitions

Apart from general device files for entire disks, individual device files for partitions exist. These are important when trying to understand how individual "parts" of a file hierarchy may be spread over several types of file system, partitions and physical devices. 

Partitions are non-physical (I am deliberately avoiding the use of the word "logical" because this is a type of partition) divisions of a hard disk. IDE Hard disks may have 4 primary partitions, one of which must be a boot partition if the hard disk is the primary (modern systems have primary and secondary disk controllers) master (first hard disk) [this is the partition BIOS attempts to load a bootstrap program from at boot time]. 

Each primary partition can be marked as an extended partition which can be further divided into four logical partitions. By default, Linux provides device files for the four primary partitions and 4 logical partitions per primary/extended partition. For example, a listing of the device files for my primary master hard disk reveals: 

brw-rw----   1 root     disk       3,   0 Apr 28  1995 /dev/hda
brw-rw----   1 root     disk       3,   1 Apr 28  1995 /dev/hda1
brw-rw----   1 root     disk       3,  10 Apr 28  1995 /dev/hda10
brw-rw----   1 root     disk       3,  11 Apr 28  1995 /dev/hda11
brw-rw----   1 root     disk       3,  12 Apr 28  1995 /dev/hda12
brw-rw----   1 root     disk       3,  13 Apr 28  1995 /dev/hda13
brw-rw----   1 root     disk       3,  14 Apr 28  1995 /dev/hda14
brw-rw----   1 root     disk       3,  15 Apr 28  1995 /dev/hda15
brw-rw----   1 root     disk       3,  16 Apr 28  1995 /dev/hda16
brw-rw----   1 root     disk       3,   2 Apr 28  1995 /dev/hda2
brw-rw----   1 root     disk       3,   3 Apr 28  1995 /dev/hda3
brw-rw----   1 root     disk       3,   4 Apr 28  1995 /dev/hda4
brw-rw----   1 root     disk       3,   5 Apr 28  1995 /dev/hda5
brw-rw----   1 root     disk       3,   6 Apr 28  1995 /dev/hda6
brw-rw----   1 root     disk       3,   7 Apr 28  1995 /dev/hda7
brw-rw----   1 root     disk       3,   8 Apr 28  1995 /dev/hda8
brw-rw----   1 root     disk       3,   9 Apr 28  1995 /dev/hda9     

Partitions are usually created by using a system utility such as fdisk. Generally fdisk will ONLY be used when a new operating system is installed or a new hard disk is attached to a system. 

Our existing hard disk would be /dev/hda1 (we will assume that we are using an IDE drive, otherwise we'd be using SCSI devices /dev/sd*). 

Our new hard disk (we'll make it a slave to the first) will be /dev/hdb1. 

Partitions and file systems

Every partition on a hard disk has an associated file system (the file system type is actually set when fdisk is run and a partition is created). For example, in DOS machines, it was usual to devote the entire hard disk (therefore the entire disk contained one primary partition) to the FAT (File Allocation Table) based file system. This is generally the case for most modern operating systems including Windows 95, Win NT and OS/2. 

However, there are occasions when you may wish to run multiple operating systems off the one disk; this is when a single disk will contain multiple partitions, each possibly containing a different file system. 

With UNIX systems, it is normal procedure to use multiple partitions in the file system structure. It is quite possible that the file system structure is spread over multiple partitions and devices, each a different "type" of file system. 

What do I mean by "type" of file system? Linux can support (or "understand", access, read and write to) many types of file systems including:  minix, ext, ext2, umsdos, msdos, proc, nfs, iso9660, xenix, Sysv, coherent, hpfs.

(There is also support for the Windows 95 and Win NT file system). A file system is simply a set or rules and algorithms for accessing files. Each system is different; one file system can't read the other.   Like device drivers, file systems are compiled into the kernel - only file systems compiled into the kernel can be accessed by the kernel. 

To discover what file systems your system supports,  you can display the contents of the /proc/filesystems file. 

 

On our new disk, if we were going to use a file system that was not supported by the kernel, we would have to recompile the kernel at this point. 

Partitions and Blocks

The smallest unit of information that can be read from or written to a disk is a block. Blocks can't be split up - two files can't use the same block, therefore even if a file only uses one byte of a block, it is still allocated the entire block. 

When partitions are created, the first block of every partition is reserved as the boot block. However, only one partition may act as a boot partition. BIOS checks the partition table of the first hard disk at boot time to determine which partition is the boot partition. In the boot block of the boot partition there exists a small program called a bootstrap loader - this program is executed at boot time by BIOS and is used to launch the OS. Systems that contain two or more operating systems use the boot block to house small programs that ask the user to chose which OS they wish to boot.  One of these programs is called lilo and is provided with Linux systems. 

The second block on the partition is called the superblock. It contains all the information about the partition including information on: 

         The size of the partition 

         The physical address of the first data block 

         The number and list of free blocks 

         Information of what type of file system uses the partition 

         When the partition was last modified 

The remaining blocks are data blocks. Exactly how they are used and what they contain are up to the file system using the partition. 

Using the partitions

So how does Linux use these partitions and file systems? 

Linux logically attaches (this process is called mounting) different partitions and devices to parts of the directory structure. For example, a system may have: 

/ mounted to /dev/hda1
/usr mounted to /dev/hda2
/home mounted to /dev/hda3
/usr/local mounted to /dev/hda4
/var/spool mounted to /dev/hdb1
/cdrom mounted to /dev/cdrom
/mnt mounted to /dev/fd0

Yet to a user of the system, the physical location of the different parts of the directory structure is transparent! 

How does this work? 


The Virtual File System

The Linux kernel contains a layer called the VFS (or Virtual File System).  The VFS processes all file-oriented IO system calls.  Based on the device that the operation is being performed on, the VFS decides which file system to use to further process the call. 

The exact list of processes that the kernel goes through when a system call is received follows along the lines of: 

         A process makes a system call.

         The VFS decides what file system is associated with the device file that the system call was made on.

         The file system uses a series of calls (called Buffer Cache Functions) to interact with the device drivers for the particular device.

         The device drivers interact with the device controllers (hardware) and the actual required processes are performed on the device.


Figure 10.1 represents this. 

Figure 10.1
The Virtual File System


Dividing up the file hierarchy - why?

Why would you bother partitioning a disk and using different partitions for different directories? 

The reasons are numerous and include:

Separation Issues

Different directory branches should be kept on different physical partitions for reasons including: 

         Certain directories will contain data that will only need to be read, others will need to be both read and written. It is possible (and good practice) to mount these partitions restricting such operations. 

         Directories including /tmp and /var/spool can fill up with files very quickly, especially if a process becomes unstable or the system is purposely flooded with email. This can cause problems.  For example, let us assume that the /tmp directory is on the same partition as the /home directory.  If the /tmp directory causes the partition to be filled no user will be able to write to their /home directory, there is no space.  If /tmp and /home are on separate partitions the filling of the /tmp partition will not influence the /home directories. 

         The logical division of system software, local software and home directories all lend themselves to separate partitions 

Backup Issues

These include: 

         Separating directories like /usr/local onto separate partitions makes the process of an OS upgrade easier - the new OS version can be installed over all partition except the partition that the /usr/local system exists on.  Once installation is complete the /usr/local partition can be re-attached. 

         The actual size of the partition can make it easier to perform backups - it isn't as easy to backup a single 2.1 Gig partition as it is to backup four 500 Meg partitions.  This does depend on the backup medium you are using.  Some medium will handle a 2.1 Gb partition quite easily.

Performance Issues

By spreading the file system over several partitions and devices, the IO load is spread around. It is then possible to have multiple seek operations occurring simultaneously - this will improve the speed of the system. 

While splitting the directory hierarchy over multiple partitions does address the above issues, it isn't always that simple.  A classic example of this is a system that contained its Web programs and data  in the /var/spool directory.  Obviously the correct location for this type of program is the /usr branch - probably somewhere off the /usr/local system.  The reason for this strange location? ALL the other partitions on the system were full or nearly full - this was the only place left to install the software!  And the moral of the story is?  When partitions are created for different branches of the file hierarchy, the future needs of the system must be considered - and even then, you won't always be able to adhere to what is "the technically correct" location to place software.

Scenario Update

At this point, we should consider how we are going to partition our new hard disk.  As given by the scenario, our /home directory is using up a lot of space (we would find this out by using the du command). 

We have the option of devoting the entire hard disk to the /home structure but as it is a 2.5 Gig disk we could probably afford to divide it into a couple of partitions.  As the /var/spool directory exists on the same partition as root, we have a potential problem of our root partition filling up - it might be an idea to separate this.  As to the size of the partitions?  As our system has just been connected to the Internet, our users have embraced FTP - our /home structure is consuming 200 Megabytes but we expect this to increase by a factor of 10 over the next 2 years.  Our server is also receiving increased volumes of email, so our spool directory will have to be large.  A split of 2 Gigabytes to 500 Megabytes will probably be reasonable. 

To create our partitions, we will use the fdisk program.  We will create two primary partitions, one of 2 Gigabytes and one of 500 Megabytes - these we will mark as Linux partitions. 

The Linux Native File System - ext2

Overview

Historically, Linux has had several native file systems.  Originally there was Minix which supported file systems of up to 64 megabytes in size and 14 character file names.  With the advent of the virtual file system (VFS) and support for multiple file systems, Linux has seen the development of Ext FS (Extended File System), Xia FS and the current ext2 FS

ext2 (the second extended file system) has longer file names (255 characters), larger file sizes (2 GB) and bigger file system support (4 TB) than any of the existing Linux file systems.  In this section, we will examine how ext2 works. 

I-Nodes

ext2 use a complex but extremely efficient method of organising block allocation to files. This system relies on data structures called I-Nodes. Every file on the system is allocated an I-Node - there can never be more files than I-Nodes

This is something to consider when you format a partition and create the file system - you will be asked how many I-Nodes you wish create. Generally, ten percent of the file system should be I-Nodes. This figure should be increased if the partition will contain lots of small files or decreased if the partition will contain few but large files. 

Figure 10.2 is a graphical representation on an I-Node. 


 


Figure 10.2
I-Node Structure 

Typically an I-Node will contain: 

         The owner (UID) and group owner (GID) of the file. 

         The type of file - is the file a directory or another type of special file? 

         User access permissions - which users can do what with the file 

         The number of hard links to the file - the same physical file may be accessed under several names; we will examine how later. 

         The size of the file 

         The time the file was last modified 

         The time the I-Node was last changed - if permissions or information on the file change then the I-Node is changed. 

         The addresses of 13 data blocks - data blocks are where the contents of the file are placed. 

         A single indirect pointer - this points to a special type of block called a single indirect block. This is a block that contains the addresses of at least 256 other data blocks; the exact number depends of the file system and implementation. 

         A double indirect pointer - this points to a special type of block called a double indirect block. This block points to a number of single indirect blocks. 

         A triple indirect pointer - this points to a special type of block called a triple indirect block. This block points to a number of double indirect blocks. 

Using this system, ext2 can cater for a file two gigabytes in size! 

However, just because an I-Node can access all those data blocks doesn't mean that they are automatically allocated to the file when it is created - obviously! As the file grows, blocks are allocated, starting with the first direct 13 data blocks, then moving on to the single indirect blocks, then to the double, then to the triple. 

Note that the actual name of the file is not stored in the I-Node. This is because the names of files are stored in directories, which are themselves files. 

Physical Structure and Features

ext2 uses a decentralised file system management scheme involving a "block group" concept.  What this means is that the file systems are divided into a series of logical blocks.  Each block contains a copy of critical information about the file systems (the super block and information about the file system) as well as an I-Node, and data block allocation tables and blocks.  Generally, the information about a file (the I-Node) will be stored close to the data blocks.  The entire system is very robust and makes file system recovery less difficult. 

The ext2 file system also has some special features which make it stand out from existing file systems including: 

         Logical block size - the size of data blocks can be defined when the file system is created; this is not dependent on physical data block size.

         File system state checks - the file system keeps track of how many times it was "mounted " (or used) and what state it was left in at the last shutdown.

         The file system reserves 5% of the file system for the root user - this means that if a user program fills a partition, the partition is still useable by root (for recovery) because there is reserve space.

A more comprehensive description of the ext2 file system can be found at .


Creating file systems

mkfs

Before a partition can be mounted (or used), it must first have a file system installed on it - with ext2, this is the process of creating I-Nodes and data blocks. 

This process is the equivalent of formatting the partition (similar to MSDOS's "format" command). Under Linux, the command to create a file system is called mkfs. 

The command is issued in the following way: 

mkfs  [-c] [ -t fstype ]  filesys [ blocks ]
eg.
mkfs -t ext2 /dev/fd0   # Make a ext2 file system on a disk

where: 

         -c forces a check for bad blocks 

         -t fstype specifies the file system type 

         filesys is either the device file associated with the partition or device OR is the directory where the file system is mounted (this is used to erase the old file system and create a new one) 

         blocks specifies the number of blocks on the partition to allocate to the file system 

         Be aware that creating a file system on a device with an existing file system will cause all data on the old file system to be erased. 

Scenario Update

Having partitioned our disk, we must now install a file system on each partition. 

ext2 is the logical choice.  Be aware that this won't always be the case and you should educate yourself on the various file systems available before making a choice. 

 Assuming /dev/hdb1 is the 2GB partition and /dev/hdb2 is the 500 MB partition, we can create ext2 file systems using the commands: 

mkfs -t ext2 -c /dev/hdb1 
mkfs -t ext2 -c /dev/hdb2 

This assumes the default block size and the default number of I-Nodes.  If we wanted to be more specific about the number of I-Nodes and block size, we could specify them.  mkfs actually calls other programs to create the file system - in the ext2 case, mke2fs.  Generally, the defaults are fine - however, if we knew that we were only storing a few large files on a partition, then we'd reduce the I-Node to data block ratio.  If we knew that we were storing lots of small files on a partition, we'd increase the I-Node to data block ration and probably decrease the size of the data blocks (there is no point using 4K data blocks when the file size average is around 1K). 

Exercises

10.4     Create an ext2 file system on a floppy disk using the defaults.  How much disk space can you use to store user information on the disk?  How many I-nodes are on this disk?  What is the smallest number of I-nodes you can have on a disk?  What restriction does this place on your use of the disk?

Mounting and UN-mounting Partitions and Devices

Mount

To attach a partition or device to part of the directory hierarchy you must mount its associated device file. 

To do this, you must first have a mount point - this is simply a directory where the device will be attached. This directory will exist on a previously mounted device (with the exception of the root directory (/) which is a special case) and will be empty. If the directory is not empty, then the files in the directory will no longer be visible while the device to mounted to it, but will reappear after the device has been disconnected (or unmounted). 

To mount a device , you use the mount command: 

mount [switches] device_file mount_point

With some devices, mount will detect what type of file system exists on the device, however it is more usual to use mount in the form of: 

mount [switches] -t file_system_type device_file mount_point

Generally, only the root user can use the mount command - mainly due to the fact that the device files are owned by root. For example, to mount the first partition on the second hard drive off the /usr directory and assuming it contained the ext2 file system you'd enter the command: 

mount -t ext2 /dev/hdb1 /usr

A common device that is mounted is the floppy drive. A floppy disk generally contains the msdos file system (but not always) and is mounted with the command: 

mount -t msdos /dev/fd0 /mnt

Note that the floppy disk was mounted under the /mnt directory? This is because the /mnt directory is the usual place to temporally mount devices. 

To see what devices you currently have mounted, simply type the command mount. Typing it on my system reveals: 

/dev/hda3 on / type ext2 (rw)
/dev/hda1 on /dos type msdos (rw)
none on /proc type proc (rw)
/dev/cdrom on /cdrom type iso9660 (ro)
/dev/fd0 on /mnt type msdos (rw)  

Each line tells me what device file is mounted, where it is mounted, what file system type each partition is and how it is mounted (ro = read only, rw = read/write). Note the strange entry on line three - the proc file system? This is a special "virtual" file system used by Linux systems to store information about the kernel, processes and current resource usages. It is actually part of the system's memory - in other words, the kernel sets aside an area of memory which it stores information about the system in - this same area is mounted onto the file system so user programs can easily gain this information. 

To release a device and disconnect it from the file system, the umount command is used. It is issued in the form: 

umount device_file
or
umount mount_point

For example, to release the floppy disk, you'd issue the command: 

umount /mnt
or
umount /dev/fd0

Again, you must be the root user or a user with privileges to do this. You can't unmount a device/mount point that is in use by a user (the user's current working directory is within the mount point) or is in use by a process. Nor can you unmount devices/mount points which in turn have devices mounted to them. 

All of this begs the question - how does the system know which devices to mount when the OS boots? 

Mounting with the /etc/fstab file

In true UNIX fashion, there is a file which governs the behaviour of mounting devices at boot time.  In Linux, this file is /etc/fstab. But there is a problem - if the fstab file lives in the /etc directory (a directory that will always be on the root partition (/)), how does the kernel get to the file without first mounting the root partition (to mount the root partition, you need to read the information in the /etc/fstab file!)? The answer to this involves understanding the kernel (a later chapter) - but in short, the system cheats! The kernel is "told" (how it is told doesn't concern us yet) on which partition to find the root file system; the kernel mounts this in read only mode, assuming the Linux native ext2 file system, then reads the fstab file and re-mounts the root partition (and others) according to instructions in the file. 

So what is in the file? 

An example line from the fstab file uses the following format: 

device_file mount_point file_system_type mount_options [n] [n]

The first three fields are self explanatory; the fourth field, mount_options defines how the device will be mounted (this includes information of access mode ro/rw, execute permissions and other information) - information on this can be found in the mount man pages (note that this field usually contains the word "defaults"). The fifth and sixth fields will usually either not be included or be "1" - these two fields are used by the system utilities dump and fsck respectively - see the man pages for details. 

 As an example, the following is my /etc/fstab file: 

/dev/hda3/ext2 defaults                1   1
/dev/hda1/dos     msdos       defaults     1   1
/dev/hda2     swap     swap  
none     /proc    proc defaults   1   1

As you can see, most of my file system exists on a single partition (this is very bad!) with my DOS partition mounted on the /dos directory (so I can easily transfer files on and off my DOS system). The third line is one which we have not discussed yet - swap partitions. The swap partition is the place where the Linux kernel keeps pages swapped out of virtual memory. Most Linux systems should access a swap partition - you should create a swap partition with a program such as fdisk before the Linux OS is installed. In this case, the entry in the /etc/fstab file tells the system that /dev/hda2 contains the swap partition - the system recognises that there is no device nor any mount point called "swap", but keeps this information within the kernel (this also applies to the fourth line pertaining to the proc file system). 

However, do you notice anything missing? What about the CDROM? On my system the CDROM is actually mounted by a script called /etc/rc.d/rc.cdrom - this script is error tolerant and won't cause problems if I don't actually have a CD in the drive at the time. 

Scenario Update

The time has come for us to use our partitions.  The following procedure should be followed: 

Mount each partition (one at a time) off /mnt Eg.

mount -t ext2 -o defaults /dev/hdb1 /mnt

Copy the files from the directory that is going to reside on the partition TO the partition Eg.

cp - a /home /mnt

Modify the /etc/fstab file to mount the partition off the correct directory Eg.

/dev/hdb1  /home  ext2  defaults  1  1

Test your changes by rebooting and using the partition

Unmount the partition and remove the old files (or back them up).

umount /home
rm -r /home
mount -t ext2 -o defaults /dev/hdb1 /home

The new hard disk should be now installed and configured correctly!

Exercises

10.5     Mount a floppy disk under the /mnt directory. 

10.6     Carefully examine your /etc/fstab file - work out what each entry means. 

10.7     Change to the /mnt directory (while the disk is mounted) - now try to unmount the disk - does this work? Why/Why not? 

File Operations 

Creating a file 

When a file is created, the following process is performed: 

         An I-Node is allocated to the file. 

         An entry is added to the current directory - remember, the directory is a file itself. This entry contains the name of the file and a pointer to I-Node used by the file. The link count on the file's I-Node is set to 1 (any I-Node with a link count of 0 is not in use).

         Any blocks required to store the file contents are allocated. 

Linking files

As we have previously encountered, there are occasions when you will want to access a file from several locations or by several names. The process of doing this is called linking. 

 There are two methods of doing this - Hard Linking and Soft Linking

 Hard Links are generated by the following process: 

         An entry is added to the current directory with the name of the link together with a pointer to the I-Node used by the original file. 

         The I-Node of the original file is updated and the number of files linked to it is incremented. 

Soft Links are generated by the following process: 

         An I-Node is allocated to the soft link file - the type of file is set to soft-link. 

         An entry is added to the current directory with the name of the link together with a pointer to the allocated I-Node. 

         A data block is allocated for the link in which is placed the name of the original file. 

Programs accessing a soft link cause the file system to examine the location of the original (linked-to) file and then carry out operations on that file. The following should be noted about links: 

         Hard links may only be performed between files on the same physical partition - the reason for this is that I-Nodes pointers can only point to I-Nodes of the same partition 

         Any operation performed on the data in link is performed on the original file. 

         Any chmod operations performed on a hard link are reflected on both the hard link file and the file it is linked to. chmod operations on soft links are reflected on the original file but not on the soft link - the soft link will always have full file permissions (lrwxrwxrwx) . 

So how do you perform these mysterious links? 

ln

The command for both hard and soft link files is ln. It is executed in the following way: 

ln source_file link_file_name   # Hard Links
or
ln -s source_file link_file_name# Soft Links

For example, look at the following operations on links: 

Create the file and check the ls listing:

psyche:~$ touch base      
psyche:~$ ls -al base
-rw-r--r--   1 jamiesob users   0 Apr  5 17:09 base

  Create a soft link and check the ls listing of it and the original file

psyche:~$ ln -s base softbase
psyche:~$ ls -al softbase
lrwxrwxrwx   1 jamiesob users   4 Apr  5 17:09 softbase -> base
psyche:~$ ls -al base
-rw-r--r--   1 jamiesob users   0 Apr  5 17:09 base

  Create a hard link and check the ls listing of it, the soft link and the original file

psyche:~$ ln base hardbase
psyche:~$ ls -al hardbase
-rw-r--r--   2 jamiesob users   0 Apr  5 17:09 hardbase
psyche:~$ ls -al base
-rw-r--r--   2 jamiesob users   0 Apr  5 17:09 base
psyche:~$ ls -il base
132307 -rw-r--r--   2 jamiesob users   0 Apr  5 17:09 base
psyche:~$ ls -il softbase
132308 lrwxrwxrwx   1 jamiesob users   4 Apr  5 17:09 softbase ->base
psyche:~$ ls -il hardbase
132307 -rw-r--r--   2 jamiesob users   0 Apr  5 17:09 hardbase

Note the last three operations (checking the I-Node number) - see how the hard link shares the I-Node of the original file? Links are removed by simply deleting the link with the rm  (or on non-Linux systems unlink) command. Note that deleting a file that has soft links is different to deleting a file with hard links - deleting a soft-linked file causes the I-Node (thus data blocks) to be deallocated - no provision is made for the soft link which is now "pointing" to a file that doesn't exist. 

However, a file with hard links to it has its entry removed from the directory, but neither its I-Node nor data blocks are deallocated - the link count on the I-Node is simply decremented. The I-Node and data blocks will only be deallocated when there are no other files hard linked to it. 

Exercises

10.8     Locate all files on the system that are soft links (Hint: use find). 

Checking the file system

Why Me?

It is a sad truism that anything that can go wrong will go wrong - especially if you don't have backups! In any event, file system "crashes" or problems are an inevitable fact of life for a System Administrator. 

Crashes of a non-physical nature (i.e. the file system becomes corrupted) are non-fatal events - there are things a system administrator can do before issuing the last rites and restoring from one of their copious backups :) 

You will be informed of the fact that a file system is corrupted by a harmless, but feared little messages at boot time, something like: 

Can't mount /dev/hda1 

If you are lucky, the system will ignore the file system problems and try to mount the corrupted partition READ ONLY. 

It is at this point that most people enter a hyperactive frenzy of swearing, violent screaming tantrums and self-destructive cranial impact diversions (head butting the wall). 

What to do

It is important to establish that the problem is logical, not physical. There is little you can do if a disk head has crashed (on the therapeutic side, taking the offending hard disk into the car park and beating it with a stick can produce favourable results). A logical crash is something that is caused by the file system becoming confused. Things like: 

         Many files using the one data block. 

         Blocks marked as free but being used and vice versa. 

         Incorrect link counts on I-Nodes. 

         Differences in the "size of file" field in the I-Node and the number of data blocks actually used. 

         Illegal blocks within files. 

         I-Nodes contain information but are not in any directory entry (these type of files, when recovered, are placed in the lost+found directory). 

         Directory entries that point to illegal or unallocated I-Nodes. 

are the product of file system confusion. These problems will be detected and (usually) fixed by a program called fsck. 

fsck

fsck is actually run at boot time on most Linux systems. Every x number of boots, fsck will do a comprehensive file system check. In most cases, these boot time runs of fsck automatically fix problems - though occasionally you may be prompted to confirm some fsck action. If however, fsck reports some drastic problem at boot time, you will usually be thrown in to the root account and issued a message like: 

**************************************
fsck returned error code - REBOOT NOW!
**************************************   

It is probably a good idea to manually run fsck on the offending device at this point (we will get onto how in a minute). 

At worst, you will get a message saying that the system can't mount the file system at all and you have to reboot. It is at this point you should drag out your rescue disks (which of course contain a copy of fsck) and reboot using them. The reason for booting from an alternate source (with its own file system) is because it is quite possible that the location of the fsck program (/sbin) has become corrupted as has the fsck binary itself! It is also a good idea to run fsck only on unmounted file systems. 

Using fsck

fsck is run by issuing the command: 

fsck file_system

where file_system is a device or directory from which a device is mounted. 

fsck will do a check on all I-Nodes, blocks and directory entries. If it encounters a problem to be fixed, it will prompt you with a message. If the message asks if fsck can SALVAGE, FIX, CONTINUE, RECONNECT or ADJUST, then it is usually safe to let it. Requests involving REMOVE and CLEAR should be treated with more caution. 

What caused the problem?

Problems with the file system are caused by: 

         People turning off the power on a machine without going through the shutdown process - this is because Linux uses a very smart READ and WRITE disk cache - this cache is only flushed (or written to disk) periodically and on shutdown. fsck will usually fix these problems at the next boot. 

         Program crashes - problems usually occur when a program is using several files and suddenly crashes without closing them. fsck usually easily fixes these problems. 

         Kernel and system crashes - the kernel may become unstable (especially if you are using new, experimental kernels) and crash the system. Depending on the circumstances, the file system will usually be recoverable. 

Exercises

10.9     Mount the disk created in an earlier exercise.  Copy the contents of your home directory to the disk.  Now copy the kernel to it (/vmlinuz) but during the copy eject the disk. Now run fsck on that disk. 

Conclusion

Having read and absorbed this chapter you will be aware that: 

         Linux supports many file systems and that 

         the process of using many file systems, partitions and devices acting in concert to produce a directory structure allows for greater flexibility, performance and system integrity. 

Review questions

10.1

As a System Administrator, you have been asked to set up a new system. The system will contain two hard disks, each 2.5 Gb in size. What issues must you consider when installing these disks? What questions should you be asking about the usage of the disks? 

10.2

You have noticed that at boot time, not all the normal messages are appearing on the screen. You have also discovered that X-Windows won't run. Suggest possible reasons for this and the solutions to the problems. 

10.3

A new hard disk has been added to your system to store the print spool in. List all the steps in adding this hard disk to the system. 


10.4

You have just dropped your Linux box while it was running (power was lost during the system's short flight) - the system boots but will not mount the hard disk. Discuss possible reasons for the problem and the solutions. 

10.5

What are links used for? What are the differences between hard and soft links?