Chapter 4

The File Hierarchy

Introduction

Why?

Like all good operating systems, UNIX allows you the privilege of storing information indefinitely (or at least until the next disk crash) in abstract data containers called files. The organisation, placement and usage of these files comes under the general umbrella of the file hierarchy.  As a system administrator, you will need to be very familiar with the file hierarchy.  You will use it on a day to day basis as you  maintain the system, install software and manage user accounts. 

 At a first glance, the file hierarchy structure of a typical Linux host (we will use Linux for the basis of our discussion) may appear to have been devised by a demented genius who'd been remiss with their medication. Why, for example, does the root directory contain something like: 

 

bin         etc         lost+found  root        usr

boot        home        mnt         sbin        var

dev         lib         proc        tmp



Why was it done like this? 

Historically, the location of certain files and utilities has not always been standard (or fixed). This has lead to problems with development and upgrading between different "distributions" of Linux [Linux is distributed from many sources, two major sources are the Slackware and Red Hat package sets]. The Linux directory structure (or file hierarchy) was based on existing flavours of UNIX, but as it evolved, certain inconsistencies developed. These were often small things like the location (or placement) of certain configuration files, but it resulted in difficulties porting software from host to host. 

To combat this, a file standard was developed. This is an evolving process, to date resulting in a fairly static model for the Linux file hierarchy. In this chapter, we will examine how the Linux file hierarchy is structured, how each component relates to the overall OS and why certain files are placed in certain locations. 


 

Linux File System Standard

 

The location and purposes of files and directories on a Linux machine are defined by the Linux File Hierarchy Standard.  The Resource Materials section of the 85321 Web site contains a pointer to it.

The important sections

The root of the problem

The top level of the Linux file hierarchy is referred to as the root (or /). The root directory typically contains several other directories including: 

 

Directory

Contains

bin/

Required Boot-time binaries 

boot/

Boot configuration files for the OS loader and kernel image

dev/

Device files 

etc/

System configuration files and scripts 

home/

User/Sub branch directories 

lib/

Main OS shared libraries and kernel modules 

Lost+found/

Storage directory for "recovered" files 

mnt/

Temporary point to connect devices to 

proc/

Pseudo directory structure containing information about the kernel, currently running processes and resource allocation 

root/

Linux (non-standard) home directory for the root user. Alternate location being the / directory itself 

sbin/

System administration binaries and tools 

tmp/

Location of temporary files 

r

usr/

Difficult to define - it contains almost everything else including local binaries, libraries, applications and packages (including X Windows) 

var/

Variable data, usually machine specific. Includes spool directories for mail and news 

Table 4.1
Major Directories

Generally, the root should not contain any additional files - it is considered bad form to create other directories off the root, nor should any other files be placed there.


Why root?

The name “root” is based on the analogous relationship between the UNIX files system structure and a tree!  Quite simply, the file hierarchy is an inverted tree.

I can personally never visiualise an upside down tree – what this phrase really means is that the “top” of the file heirarchy is at one point, like the root of a tree, the bottom is spread out, like the branches of a tree.  This is probably a silly analogy because if you turn a tree upside down, you have lots of spreading roots, dirt and several thousand very unhappy worms!

Every part of the file system eventually can be traced back to one central point, the root.  The concept of a “root” structure has now been (partially) adopted by other operating systems such as Windows NT.  However, unlike other operatings systems, UNIX doesn't have any concept of  “drives”.  While this will be explained in detail in a later chapter, it is important to be aware of the following:

The file system may be spread over several physical devices; different parts of the file heirarchy may exist on totally separate partitions, hard disks, CD-ROMs, network file system shares, floppy disks and other devices.

This separation is transparent to the file system heirarchy, user and applications.

Different “parts” of the file system will be “connected” (or mounted) at startup; other parts will be dynamically attached as required.

The remainder of this chapter examines some of the more important directory structures in the Linux file hierarchy.

Homes for users

Every user needs a home...

The /home directory structure contains the the home directories for most login-enabled users (some notable exceptions being the root user and (on some systems) the www/web user). While most small systems will contain user directories directly off the /home directory (for example, /home/jamiesob), on larger systems is common to subdivide the home structure based on classes (or groups) of users, for example: 

        /home/admin             # Administrators 
        /home/finance           # Finance users 
        /home/humanres          # Human Resource users 
        /home/mgr               # Managers 
        /home/staff             # Other people 


Other homes?

/root is the home directory for the root user. If, for some strange reason, the /root directory doesn't exist, then the root user will be logged in in the / directory - this is actually the traditional location for root users. 

There is some debate as to allowing the root user to have a special directory as their login point - this idea encourages the root user to set up their .profile, use "user" programs like elm, tin and netscape (programs which require a home directory in which to place certain configuration files) and generally use the root account as a beefed up user account. A system administrator should never use the root account for day to day user-type interaction; the root account should only be used for system administration purposes only. 

 

Be aware that you must be extremely careful when allowing a user to have a home directory in a location other than the /home branch.  The problem occurs when you, as a system administrator, have to back-up the system - it is easy to miss a home directory if it isn't grouped with others in a common branch (like /home). 

/usr and /var

And the difference is...

It is often slightly confusing to see that /usr and /var both contain similar directories: 

 

       /usr

 

       X11R6             games             libexec           src

bin               i486-linux-libc5  local             tmp

dict              include           man

doc               info              sbin

etc               lib               share

 

      /var

catman    local     log       preserve  spool

lib       lock      nis       run       tmp


It becomes even more confusing when you start examining the the maze of links which intermingle the two major branches. 

Links are a way of referencing a file or directory by many names and many locations within the file hierarchy.  They are effectively like "pointers" to files - think of them as like leaving a post-it note saying "see this file".  Links will be explained in greater detail in the next chapter. 


To put it simply, /var is for VARiable data/files. /usr is for USeR accessible data, programs and libraries. Unfortunately, history has confused things - files which should have been placed in the /usr branch have been located in the /var branch and vice versa. Thus to "correct" things, a series of links have been put in place. Why the reason for the separation? Does it matter. The answer is: Yes, but No :) 

Yes in the sense that the file standard dictates that the /usr branch should be able to be mounted (another way of saying "attached" to the file hierarchy - this will be covered in the next chapter) READ ONLY (thus can't contain variable data). The reasons for this are historical and came about because of something called NFS exporting. 

NFS exporting is the process of one machine (a server) "exporting" its copy of the /usr structure (and others) to the network for other systems to use. 

If several systems were "sharing" the same /usr structure, it would not be a good idea for them all to be writing logs and variable data to the same area! It is also used because minimal installations of Linux can use the /usr branch directly from the CDROM (a read-only device). 

However, it is "No" in the sense that: 

§         /usr is usually mounted READ-WRITE-EXECUTE on Linux systems anyway 

§         In the author's experience, exporting /usr READ-ONLY via NFS isn't entirely successful without making some very non-standard modifications to the file hierarchy! 

The following are a few highlights of the /var and /usr directory branches: 

/usr/local

All software that is installed on a system after the operating system package itself should be placed in the /usr/local directory. Binary files should be located in the /usr/local/bin (generally /usr/local/bin should be included in a user's PATH setting). By placing all installed software in this branch, it makes backups and upgrades of the system far easier - the system administrator can back-up and restore the entire /usr/local system with more ease than backing-up and restoring software packages from multiple branches (i.e.. /usr/src, /usr/bin etc.). 
An example of a /usr/local directory is listed below: 

bin       games         lib           rsynth            cern
man       sbin          volume-1.11   info
mpeg      speak         www           etc               java          
netscape  src  

As you can see, there are a few standard directories (bin,  lib and src) as well as some that contain installed programs. 


lib, include and src

Linux is a very popular platform for C/C++, Java and Perl program development. As we will discuss in later chapters, Linux also allows the system administrator to actually modify and recompile the kernel. Because of this, compilers, libraries and source directories are treated as "core" elements of the file hierarchy structure. 

The /usr structure plays host to three important directories: 

/usr/include holds most of the standard C/C++ header files - this directory will be referred to as the primary include directory in most Makefiles. 

Makefiles are special script-like files that are processed by the make program for the purposes of compiling, linking and building programs. 

/usr/lib holds most static libraries as well as hosting subdirectories containing libraries for other (non C/C++) languages including Perl and TCL. It also plays host to configuration information for ldconfig

/usr/src holds the source files for most packages installed on the system. This is traditionally the location for the Linux source directory (/usr/src/linux), for example: 

  linux         linux-2.0.31  redhat

Unlike DOS/Windows based systems, most Linux programs usually come as source and are compiled and installed locally 

/var/spool

This directory has the potential for causing a system administrator a bit of trouble as it is used to store (possibly) large volumes of temporary files associated with printing, mail and news. /var/spool may contain something like: 

 

at          lp         lpd         mqueue      samba       uucppublic

cron        mail        rwho        uucp

 

In this case, there is a printer spool directory called lp  (used for storing print request for the printer lp) and a /var/spool/mail directory that contains files for each user’s incoming mail. 

Keep an eye on the space consumed by the files and directories found in /var/spool.  If a device (like the printer) isn't working or a large volume of e-mail has been sent to the system, then much of the hard drive space can be quickly consumed by files stored in this location. 


X Windows

X-Windows provides UNIX with a very flexible graphical user interface.  Tracing the X Windows file hierarchy can be very tedious, especially when your are trying to locate a particular configuration file or trying to removed a stale lock file. 

A lock file is used to stop more than one instance of a program executing at once, a stale lock is a lock file that was not removed when a program terminated, thus stopping the same program from restarting again 

Most of X Windows is located in the /usr structure, with some references made to it in the /var structure. 

Typically, most of the action is in the /usr/X11R6 directory (this is usually an alias or link to another directory depending on the release of X11 - the X Windows manager). This will contain: 

        bin      doc     include  lib      man

The main X Windows binaries are located in /usr/X11R6/bin.  This may be accessed via an alias of /usr/bin/X11 .

Configuration files for X Windows are located in /usr/X11R6/lib. To really confuse things, the X Windows configuration utility, xf86config, is located in /usr/X11R6/bin, while the configuration file it produces is located in /etc/X11 (XF86Config)! 

Because of this, it is often very difficult to get an "overall picture" of how X Windows is working - my best advice is read up on it before you start modifying (or developing with) it. 

Bins

Which bin?

A very common mistake amongst first time UNIX users is to incorrectly assume that all "bin" directories contain temporary files or files marked for deletion. This misunderstanding comes about because: 

§         People associate the word "bin" with rubbish 

§         Some unfortunate GUI based operating systems use little icons of "trash cans" for the purposes of storing deleted/temporary files. 

However, bin is short for binary - binary or executable files. There are four major bin directories (none of which should be used for storing junk files :) 

§         /bin 

§         /sbin 

§         /usr/bin 

§         /usr/local/bin 

Why so many? 

All of the bin directories serve similar but distinct purposes; the division of binary files serves several purposes including ease of backups, administration and logical separation. Note that while most binaries on Linux systems are found in one of these four directories, not all are. 

/bin

This directory must be present for the OS to boot. It contains utilities used during the startup; a typical listing would look something like: 

        Mail           df             gzip           mount          stty
        arch           dialog         head           mt             su
        ash            dircolors      hostname       mt-GNU         sync
        bash           dmesg          ipmask         mv             tar
        cat            dnsdomainname  kill           netstat        tcsh
        chgrp          domainname     killall        ping           telnet
        chmod

       domainname-yp  ln             ps             touch
        chown          du             login          pwd            true
        compress       echo           ls             red            ttysnoops
        cp             ed             mail           rm             umount
        cpio   

       false          mailx          rmdir          umssync
        csh            free           mkdir          setserial      uname
        cut            ftp            mkfifo         setterm        zcat
        date           getoptprog     mknod          sh             zsh
        dd             gunzip         more           sln              

Note that this directory contains the shells and some basic file and text utilities (ls, pwd, cut, head, tail, ed etc). Ideally, the /bin directory will contain as few files as possible as this makes it easier to take a direct copy for recovery boot/root disks. 

/sbin

/sbin Literally "System Binaries". This directory contains files that should generally only be used by the root user, though the Linux file standard dictates that no access restrictions should be placed on normal users to these files. It should be noted that the PATH setting for the root user includes /sbin, while it is (by default) not included in the PATH of normal users. 

The /sbin directory should contain essential system administration scripts and programs, including those concerned with user management, disk administration, system event control (restart and shutdown programs) and certain networking programs. 

As a general rule, if users need to run a program, then it should not be located in /sbin. A typical directory listing of /sbin looks like: 

        adduser           ifconfig          mkfs.minix        rmmod
        agetty            init              mklost+found      rmt
        arp               insmod            mkswap            rootflags
        badblocks         installpkg        mkxfs             route
        bdflush           kbdrate           modprobe          runlevel
        chattr            killall5          mount             setup
        clock             ksyms             netconfig         setup.tty
        debugfs           ldconfig          netconfig.color   shutdown
        depmod            lilo              netconfig.tty     swapdev       
        dosfsck           liloconfig        pidof             swapoff
        dumpe2fs          liloconfig-color  pkgtool           swapon
        e2fsck            lsattr            pkgtool.tty       telinit
        explodepkg        lsmod             plipconfig        tune2fs
        fdisk             makebootdisk      ramsize           umount
        fsck              makepkg           rarp 
            update
        fsck.minix        mkdosfs           rdev              vidmode
        genksyms          mke2fs            reboot            xfsck
        halt              mkfs             removepkg          

The very important ldconfig program is also located in /sbin. While not commonly used from the shell prompt, ldconfig is an essential program for the management of dynamic libraries (it is usually executed at boot time). It will often have to be manually run after library (and system) upgrades. 

You should also be aware of: 
/usr/sbin - used for non-essential admin tools. 
/usr/local/sbin - locally installed admin tools. 

/usr/bin

This directory contains most of the user binaries - in other words, programs that users will run. It includes standard user applications including editors and email clients as well as compilers, games and various network applications. 

A listing of this directory will contain some 400 odd files.  Users should definitely have /usr/bin in their PATH setting. 

/usr/local/bin

To this point, we have examined directories that contain programs that are (in general) part of the actual operating system package. Programs that are installed by the system administrator after that point should be placed in /usr/local/bin. The main reason for doing this is to make it easier to back up installed programs during a system upgrade, or in the worst case, to restore a system after a crash. 

The /usr/local/bin directory should only contain binaries and scripts - it should not contain subdirectories or configuration files. 

Configuration files, logs and other bits!

etc etc etc.

/etc is one place where the root user will spend a lot of time. It is not only the home to the all important passwd file, but contains just about every configuration file for a system (including those for networking, X Windows and the file system). 

The /etc branch also contains the skel, X11 and rc.d directories. 

/etc/skel contains the skeleton user files that are placed in a user's directory when their account is created. 

/etc/X11 contains configuration files for X Windows. 

/etc/rc.d is contains rc directories - each directory is given by the name rcn.d (n is the run level) - each directory may contain multiple files that will be executed at the particular run level.  A sample listing of a /etc/rc.d directory looks something like:

init.d      rc.local    rc0.d       rc2.d       rc4.d       rc6.d

rc          rc.sysinit  rc1.d       rc3.d       rc5.d

Logs

Linux maintains a particular area in which to place logs (or files which contain records of events). This directory is /var/log

This directory usually contains: 

cron         lastlog      maillog.2    samba-log.   secure.2     uucp
cron.1       log.nmb      messages     samba.1      sendmail.st  wtmp
cron.2       log.smb      messages.1   samba.2      spooler      xferlog
dmesg        maillog      messages.2   secure       spooler.1    xferlog.1
httpd        maillog.1    samba        secure.1     spooler.2    xferlog.2

/proc

The /proc directory hierarchy contains files associated with the executing kernel.  The files contained in this structure contain information about the state of the system's resource usage (how much memory, swap space and CPU is being used), information about each process and various other useful pieces of information.  We will examine this directory structure in more depth in later chapters. 

The /proc file system is the main source of information for a program called top.  This is a very useful administration tool as it displays a "live" readout of the CPU and memory resources being used by each process on the system. 

/dev

We will be discussing /dev in detail in the next chapter, however, for the time being, you should be aware that this directory is the primary location for special files called device files

Conclusion

Future standards

Because Linux is a dynamic OS, there will no doubt be changes to its file system as well. Two current issues that face Linux are: 

§         Porting Linux on to may architectures and requiring a common location for hardware independent data files and scripts - the current location is /usr/share - this may change. 

§         The location of third-party commercial software on Linux systems - as Linux's popularity increases, more software developers will produce commercial software to install on Linux systems. For this to happen, a location in which this can be installed must be provided and enforced within the file system standard. Currently, /opt is the likely option. 

Because of this, it is advisable to obtain and read the latest copy of the file system standard so as to be aware of the current issues. Other information sources are easily obtainable by searching the web. 

You should also be aware that while (in general), the UNIX file hierarchy looks similar from version to version, it contains differences based on requirements and the history of the development of the operating system implementation. 

Review Questions

4.1

You have just discovered that the previous system administrator of the system you now manage installed netscap in /sbin.  Is this an appropiate location?  Why/Why not?. 

4.2

Where are man pages kept? Explain the format of the man page directories. (Hint: I didn't explain this anywhere in this chapter - you may have to do some looking) 

4.3

As a system administrator, you are going to install the following programs, in each case, state the likely location of each package: 

§         Java compiler and libraries 

§         DOOM (a loud, violent but extremely entertaining game) 

§         A network sniffer (for use by the sys admin only) 

§         A new kernel source 

A X Windows manager binary specially optimised for your new monitor