Chapter 12

Startup and Shutdown

Introduction

Being a multi-tasking, multi-user operating system means that UNIX is a great deal more complex than an operating system like MS-DOS. Before the UNIX operating system can perform correctly, there are a number of steps that must be followed, and procedures executed. The failure of any one of these can mean that the system will not start, or if it does it will not work correctly. It is important for the Systems Administrator to be aware of what happens during system startup so that any problems that occur can be remedied.

It is also important for the Systems Administrator to understand what the correct mechanism is to shut a UNIX machine down. A UNIX machine should (almost) never be just turned off. There are a number of steps to carry out to ensure that the operating system and many of its support functions remain in a consistent state.

By the end of this chapter you should be familiar with the startup and shutdown procedures for a UNIX machine and all the related concepts.

A booting overview

The process by which a computer is turned on and the UNIX operating system starts functioning – booting - consists of the following steps

§         finding the kernel,
The first step is to find the kernel of the operating system.  How this is achieved is usually particular to the type of hardware used by the computer.

§         starting the kernel,
In this step the kernel starts operation and in particular goes looking for all the hardware devices that are connected to the machine.

§         starting the processes.
All the work performed by a UNIX computer is done by processes.  In this stage, most of the system processes and daemons are started.  This step also includes a number of steps which configure various services necessary for the system to work.


Finding the Kernel

For a UNIX computer to be functional it must have a kernel.  The kernel provides a number of essential services which are required by the rest of the system in order for it to be functional.  This means that the first step in the booting process of a UNIX computer is finding out where the kernel is.  Once found, it can be started, but that's the next section.

ROM

Most machines have a section of read only memory (ROM) that contains a program the machine executes when the power first comes on. What is programmed into ROM will depend on the hardware platform.

For example, on an IBM PC, the ROM program typically does some hardware probing and then looks in a number of predefined locations (the first floppy drive and the primary hard drive partition) for a bootstrap program.

On hardware designed specifically for the UNIX operating system (machines from DEC, SUN etc), the ROM program will be a little more complex. Many will present some form of prompt. Generally this prompt will accept a number of commands that allow the Systems Administrator to specify

§         where to boot the machine from, and
Sometimes the standard root partition will be corrupt and the system will have to be booted from another device. Examples include another hard drive, a CD-ROM, floppy disk or even a tape drive.

§         whether to come up in single user or multi-user mode.

As a bare minimum, the ROM program must be smart enough to work out where the bootstrap program is stored and how to start executing it.

The ROM program generally doesn't know enough to know where the kernel is or what to do with it.

The bootstrap program

At some stage the ROM program will execute the code stored in the boot block of a device (typically a hard disk drive). The code stored in the boot block is referred to as a bootstrap program.  Typically the boot block isn't big enough to hold the kernel of an operating system so this intermediate stage is necessary.

The bootstrap program is responsible for locating and loading (starting) the kernel of the UNIX operating system into memory. The kernel of a UNIX operating system is usually stored in the root directory of the root file system under some system-defined filename. Newer versions of Linux, including RedHat 5.0, put the kernel into a directory called /boot.

The most common bootstrap program in the Linux world is a program called LILO .

 

Reading

LILO is such an important program to the Linux operating system that it has its own HOW-TO.  The HOW-TO provides a great deal of information about the boot process of a Linux computer. 

Booting on a PC

The BIOS on a PC generally looks for a bootstrap program in one of two places (usually in this order)

§         the first (A:) floppy drive, or

§         the first (C:) hard drive.

By playing with your BIOS settings you can change this order or even prevent the BIOS from checking one or the other.

The BIOS loads the program that is on the first sector of the chosen drive and loads it into memory. This bootstrap program then takes over.

On the floppy

On a bootable floppy disk the bootstrap program simply knows to load the first blocks on the floppy that contain the kernel into a specific location in memory.

A normal Linux boot floppy contains no file system. It simply contains the kernel copied into the first sectors of the disk. The first sector on the disk contains the first part of the kernel which knows how to load the remainder of the kernel into RAM.

Making a boot disk

The simplest method for creating a floppy disk which will enable you to boot a Linux computer is

§         insert a floppy disk into a computer already running Linux

§         login as root

§         change into the /boot directory

§         copy the current kernel onto the floppy
dd if=vmlinuz of=/dev/fd0
The name of the kernel, vmlinuz, may change from system to system.  For example, on some RedHat 5.0 machines it may be vmlinux-2.0.31.

§         tell the boot disk where to find the root disk
rdev /dev/fd0 /dev/hda1
Where /dev/fd0 is the device for the floppy drive you are using and /dev/hda1 is the device file for your root disk.  You need to make sure you replace /dev/fd0 and /dev/hda1 with the appropriate values for your system.


Exercises

12.1      Using the above steps create a boot floppy for your machine and test it out.

Using a boot loader

Having a boot floppy for your system is a good idea.  It can come in handy if you do something to your system which prevents the normal boot procedure from working.  One example of this is when you are compiling a new kernel.  It is not unheard of for people to create a kernel which will not boot their system.  If you don't have an alternative boot method in this situation then you will have some troubles.

However, you can't use this process to boot from a hard-drive.  Instead a boot loader or boot strap program, such as LILO, is used.  A boot loader generally examines the partition table of the hard-drive, identifies the active partition, and then reads and starts the code in the boot sector for that partition. This is a simplification. In reality the boot loader must identify, somehow, the sectors in which the kernel resides.

Other features a boot loader (under Linux) offers include

§         using a key press to bring up a prompt to modify the boot procedure, and

§         the passing of parameters to the kernel to modify its operation

Exercises

12.2      If you have the time, haven't done so already, or know it is destined to failure read the LILO documentation and install LILO onto your system.
There are some situations where you SHOULD NOT install LILO.  These are outlined in the documentation.  Make sure you take notice of these situations.

Starting the kernel

Okay, the boot strap program or the ROM program has found your system's kernel.  What happens during the startup process?  The kernel will go through the following process

§         initialise its internal data structures,
Things like ready queues, process control blocks and other data structures need to be readied.

§         check for the hardware connected to your system,
It is important that you are aware that the kernel will only look for hardware that it contains code for.  If your system has a SCSI disk drive interface your kernel must have the SCSI interface code before it will be able to use it.

§         verify the integrity of the root file system and then mount it, and

§         create the process 0 (swapper) and process 1 (init).

The swapper process is actually part of the kernel and is not a "real" process. The init process is the ultimate parent of all processes that will execute on a UNIX system.

Once the kernel has initialised itself, init will perform the remainder of the startup procedure.

Kernel boot messages

When a UNIX kernel is booting, it will display messages on the main console about what it is doing. Under Linux, these messages are also sent to syslog and are by default appended onto the file /var/log/messages. The following is a copy of the boot messages on my machine with some additional comments to explain what is going on.

Examine the messages that your kernel displays during bootup and compare them with mine.

start kernel logging
Feb  2 15:30:40 beldin kernel: klogd 1.3-3, log source = /proc/kmsg started.
Loaded 4189 symbols from /boot/System.map.
Symbols match kernel version 2.0.31.
Loaded 2 symbols from 3 modules.
Configure the console
Console: 16 point font, 400 scans
Console: colour VGA+ 80x25, 1 virtual console (max 63)
Start PCI software
pcibios_init : BIOS33 Service Directory structure at 0x000f9320
pcibios_init : BIOS32 Service Directory entry at 0xf0000
pcibios_init : PCI BIOS revision 2.00 entry at 0xf0100
Probing PCI hardware.
Calibrating delay loop.. ok - 24.01 BogoMIPS
check the memory
Memory: 30844k/32768k available (736k kernel code, 384k reserved, 804k data)
start networking
Swansea University Computer Society NET3.035 for Linux 2.0
NET3: Unix domain sockets 0.13 for Linux NET3.035.
Swansea University Computer Society TCP/IP for NET3.034
IP Protocols: IGMP, ICMP, UDP, TCP
VFS: Diskquotas version dquot_5.6.0 initialized
check the CPU and find that it suffers from the Pentium bug
Checking 386/387 coupling... Hmm, FDIV bug i586 system
Checking 'hlt' instruction... Ok.
Linux version 2.0.31 (root@porky.redhat.com) (gcc version 2.7.2.3) #1 Sun Nov 9
21:45:23 EST 1997
start swap
Starting kswapd v 1.4.2.2
start the serialdrivers
tty00 at 0x03f8 (irq = 4) is a 16550A
tty01 at 0x02f8 (irq = 3) is a 16550A
start drivers for the clock, drives
Real Time Clock Driver v1.07
Ramdisk driver initialized : 16 ramdisks of 4096K size
hda: FUJITSU M1636TAU, 1226MB w/128kB Cache, CHS=622/64/63
hdb: SAMSUNG PLS-30854A, 810MB w/256kB Cache, CHS=823/32/63
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
md driver 0.35 MAX_MD_DEV=4, MAX_REAL=8
scsi : 0 hosts.
scsi : detected total.
Partition check:
 hda: hda1 hda2 < hda5 >
 hdb: hdb1
mount the root file system an start swap
VFS: Mounted root (ext2 filesystem) readonly.
Adding Swap: 34236k swap-space (priority -1)
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
sysctl: ip forwarding off
Swansea University Computer Society IPX 0.34 for NET3.035
IPX Portions Copyright (c) 1995 Caldera, Inc.
Appletalk 0.17 for Linux NET3.035
eth0: 3c509 at 0x300 tag 1, 10baseT port, address  00 20 af 33 b5 be, IRQ 10.
3c509.c:1.12 6/4/97 becker@cesdis.gsfc.nasa.gov
eth0: Setting Rx mode to 1 addresses.

Starting the processes

So at this stage the kernel has been loaded, it has initialised its data structures and found all the hardware devices.  At this stage your system can't do anything.  The operating system kernel only supplies services which are used by processes.  The question is how are these other processes created and executed.

On a UNIX system the only way in which a process can be created is by an existing process performing a fork operation. A fork creates a brand new process that contains copies of the code and data structures of the original process. In most cases the new process will then perform an exec that replaces the old code and data structures with that of a new program.

But who starts the first process?

init is the process that is the ultimate ancestor of all user processes on a UNIX system. It always has a Process ID (PID) of 1. init is started by the operating system kernel so it is the only process that doesn't have a process as a parent. init is responsible for starting all other services provided by the UNIX system.  The services it starts are specified by init'sconfiguration file, /etc/inittab .

Run levels

init is also responsible for placing the computer into one of a number of run levels.  The run level a computer is in controls what services are started (or stopped) by init.  Table 12.2 summarises the different run levels used by RedHat Linux 5.0.  At any one time, the system must be in one of these run levels.

When a Linux system boots, init examines the /etc/inittab file for an entry of type initdefault. This entry will determine the initial run level of the system.

 


 

Run level

Description

0

Halt the machine

 1

Single user mode. All file systems mounted, only small set of kernel processes running.  Only root can login.

2

multi-user mode , without remote file sharing

3

multi-user mode with remote file sharing, processes, and daemons

4

user definable system state

5

used for to start X11 on boot

6

shutdown and reboot

a b c

ondemand run levels

s or S

same as single-user mode, only really used by scripts

Table 12.1
Run levels

Under Linux, the telinit command is used to change the current run level. telinit is actually a soft link to init. telinit accepts a single character argument from the following

§         0 1 2 3 4 5 6
The run level is switched to this level.

§         Q q
Tells init that there has been a change to /etc/inittab (its configuration file) and that it should re-examine it.

§         S s
Tells init to switch to single user mode.

/etc/inittab

/etc/inittab is the configuration file for init. It is a colon delimited field where # characters can be used to indicate comments. Each line corresponds to a single entry and is broken into four fields

§         the identifier
One or two characters to uniquely identify the entry.

§         the run level
Indicates the run level at which the process should be executed

§         the action
Tells init how to execute the process

§         the process
The full path of the program or shell script to execute.


What happens

When init is first started it determines the current run level (by matching the entry in /etc/inittab with the action initdefault) and then proceeds to execute all of the commands of entries that match the run level.

The following is an example /etc/inittab taken from a RedHat machine with some comments added.

Specify the default run level
id:3:initdefault:

# System initialisation.
si::sysinit:/etc/rc.d/rc.sysinit

when first entering various runlevels run the related startup scripts
before going any further

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Things to run in every runlevel.
ud::once:/sbin/update

call the shutdown command to reboot the system when the use does the
three fingered salute

ca::ctrlaltdel:/sbin/shutdown -t3 -r now

A powerfail signal will arrive if you have a uninterruptable power supply (UPS)
if this happens shut the machine down safely

pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"


Start the login process for the virtual consoles
1:12345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

If the machine goes into runlevel 5, start X
x:5:respawn:/usr/bin/X11/xdm -nodaemon

The identifier

The identifier, the first field, is a unique two character identifier. For inittab entries that correspond to terminals the identifier will be the suffix for the terminals device file.

For each terminal on the system a getty process must be started by the init process. Each terminal will generally have a device file with a name like /dev/tty??, where the ?? will be replaced by a suffix. It is this suffix that must be the identifier in the /etc/inittab file.

Run levels

The run levels describe at which run levels the specified action will be performed. The run level field of /etc/inittab can contain multiple entries, e.g. 123, which means the action will be performed at each of those run levels.

Actions

The action's field describes how the process will be executed. There are a number of pre-defined actions that must be used. Table 10.2 lists and explains them.

 

Action

Purpose

respawn

restart the process if it finishes

wait

init will start the process once and wait until it has finished before going on to the next entry

once

start the process once, when the runlevel is entered

boot

perform the process during system boot (will ignore the runlevel field)

bootwait

a combination of boot and wait

off

do nothing

initdefault

specify the default run level

sysinit

execute process during boot and before any boot or bootwait entries

powerwait

executed when init receives the SIGPWR signal which indicates a problem with the power, init will wait until the process is completed

ondemand

execute whenever the ondemand runlevels are called (a b c).  When these runlevels are called there is NO change in runlevel.

powerfail

same as powerwait but don't wait (refer to the man page for the action powerokwait)

ctrlaltdel

executed when init receives SIGINT signal (usually when someone does CTRL-ALT-DEL

Table 12.2
inittab actions

The process

The process is simply the name of the command or shell script that should be executed by init.

Daemons and Configuration Files

init is an example of a daemon.  It will only read its configuration file, /etc/inittab, when it starts execution.  Any changes you make to /etc/inittab will not influence the execution of init until the next time it starts, i.e. the next time your computer boots.

There are ways in which you can tell a daemon to re-read its configuration files.  One generic method, which works most of the time, is to send the daemon the HUP signal.  For most daemons the first step in doing this is to find out what the process id (PID) is of the daemon.  This isn't a problem for init. Why?

It's not a problem for init because init always has a PID of 1.

The more accepted method for telling init to re-read its configuration file is to use the telinit command.  telinit q will tell init to re-read its configuration file.

Exercises

12.3      Add an entry to the /etc/inittab file so that it displays a message HELLO onto your current terminal (HINT: you can find out your current terminal using the tty command).

12.4      Modify the inittab entry from the previous question so that the message is displayed again and again and....

12.5      Take your system into single user mode.

12.6      Take your system into runlevel 5.  What happens?  (only do this if you have X Windows configured for your system).  Change your system so that it enters this run level when it boots.  Reboot your system and see what happens.

12.7      The wall command is used to display a message onto the terminals of all users. Modify the /etc/inittab file so that whenever someone does the three finger salute (CTRL-ALT-DEL) it displays a message on the consoles of all users and doesn't log out.

12.8      Examine your inittab file for an entry with the identifier 1. This is the entry for the first console, the screen you are on when you first start your system.
Change the entry for 1 so that the action field contains once instead of respawn. Force init to re-read the inittab file and then log in and log out on that console.
What happens?

System Configuration

There are a number of tasks which must be completed once during system startup which must be completed once.  These tasks are usually related to configuring your system so that it will operate.  Most of these tasks are performed by the /etc/rc.d/rc.sysinit script.

It is this script which performs the following operations

§         sets up a search path that will be used by the other scripts

§         obtains network configuration data

§         activates the swap partitions of your system

§         sets the hostname of your system
Every UNIX computer has a hostname.  You can use the UNIX command hostname to set and also display your machine's hostname.

§         sets the machines NIS domain (if you are using one)

§         performs a check on the file systems of your system

§         turns on disk quotas (if being used)

§         sets up plug'n'play support

§         deletes old lock and tmp files

§         sets the system clock

§         loads any kernel modules.

Terminal logins

In a later chapter we will examine the login procedure in more detail. This is a brief summary to explain how the login procedure relates to the boot procedure.

For a user to login there must be a getty process (RedHat Linux uses a program called mingetty, slightly different name but same task) running for the terminal they wish to use. It is one of init's responsibilities to start the getty processes for all terminals that are physically connected to the main machine, and you will find entries in the /etc/inittab file for this.

Please note this does not include connections over a network. They are handled with a different method. This method is used for the virtual consoles on your Linux machine and any other dumb terminals you might have connected via serial cables.   You should be able see the entries for the virtual consoles in the example /etc/inittab file from above.

Exercises

12.9      When you are in single user mode there is only one way to login to a Linux machine, from the first virtual console.  How is this done?

Startup scripts

Most of the services which init starts are started when init executes the system start scripts.  The system startup scripts are shell scripts written using the Bourne shell (this is one of the reasons you need to know the bourne shell syntax).  You can see where these scripts are executed by looking at the inittab file.

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

These scripts start a number of services and also perform a number of configuration checks including

§         checking the integrity of the machine's file systems using fsck,

§         mounting the file systems,

§         designating paging and swap areas,

§         checking disk quotas,

§         clearing out temporary files in /tmp and other locations,

§         startin up system daemons for printing, mail, accounting, system logging, networking, cron and syslog.

In the UNIX world there are two styles for startup files: BSD and System V.  RedHat Linux 5.0 uses the System V style and the following section concentrates on this format.  Table 12.3 summarises the files and directories which are associated with the RedHat 5.0 startup scripts.  All the files and directories in Table 12.3 are stored in the /etc/rc.d directory.

Filename

Purpose

rc0.d rc1.d rc2.d rc3.d rc4.d rc5.d rc6.d

directories which contain links to scripts which are executed when a particular runlevel is entered

rc

A shell script which is passed the run level. It then executes the scripts in the appropriate directory.

init.d

Contains the actual scripts which are executed.  These scripts take either start or stop as a parameter

rc.sysinit

run once at boot time to perform specific system initialisation steps

rc.local

the last script run, used to do any tasks specific to your local setup that isn't done in the normal SysV setup

rc.serial

not always present, used to perform special configuration on any serial ports

Table 12.3
Linux startup scripts

The Linux Process

When init first enters a run level it will execute the script /etc/rc.d/rc (as shown in the example /etc/inittab above).  This script then proceeds to

§         determine the current and previous run levels

§         kill any services which must be killed

§         start all the services for the new run level.

The /etc/rc.d/rc script knows how to kill and start the services for a particular run level because of the filenames in the directory for each runlevel.  The following are the filenames from the /etc/rc.d/rc3.d directory on my system.

[david@beldin rc.d]$ ls rc3.d
K10pnserver    K55routed      S40atd         S60lpd         S85postgresql
K20rusersd     S01kerneld     S40crond       S60nfs         S85sound
K20rwhod       S10network     S40portmap     S75keytable    S91smb
K25innd        S15nfsfs       S40snmpd       S80sendmail    S99local
K25news        S20random      S45pcmcia      S85gpm
K30ypbind      S30syslog      S50inet        S85httpd

You will notice that all the filenames in this, and all the other rcX.d directories, use the same format.

[SK]numberService

Where number is some integer and Service is the name of a service.

All the files with names starting with S are used to start a service.  Those starting with K are used to kill a service.  From the rc3.d directory above you can see scripts which start services for the Internet (S50inet), PCMCIA cards (S45pcmcia), a Web server (S85httpd) and a database (S85postgresql).

The numbers in the filenames are used to indicate the order in which these services should be started and killed.  You'll notice that the script to start the Internet services comes before the script to start the Web server; obviously the Web server depends on the Internet services.

/etc/rc.d/init.d

If we look closer we can see that the files in the rcX.d directories aren't really files.

[david@beldin rc.d]$ ls -l rc3.d/S50inet
lrwxrwxrwx   1 root   root  14 Dec 19 23:57 rc3.d/S50inet -> ../init.d/inet

The files in the rcX.d directories are actually soft links to scripts in the /etc/rc.d/init.d directory.  It is these scripts which perform all the work.

Starting and stopping

The scripts in the /etc/rc.d/init.d directory are not only useful during the system startup process,  they can also be useful when you are performing maintenance on your system.  You can use these scripts to start and stop services while you are working on them.

For example, lets assume you are changing the configuration of your Web server.  Once you've finished editing the configuration files (in /etc/httpd/conf on a RedHat 5.0 machine) you will need to restart the Web server for it to see the changes.  One way you could do this would be to follow this example

[root@beldin rc.d]# /etc/rc.d/init.d/httpd stop
Shutting down http:
[root@beldin rc.d]# /etc/rc.d/init.d/httpd start
Starting httpd: httpd

This example also shows you how the scripts are used to start or stop a service.  If you examine the code for /etc/rc.d/rc (remember this is the script which runs all the scripts in /etc/rc.d/rcX.d) you will see two lines.  One with $i start and the other with $i stop.  These are the actual lines which execute the scripts.

Lock files

All of the scripts which start services during system startup create lock files.  These lock files, if they exist, indicate that a particular service is operating.  Their main use is to prevent startup files starting a service which is already running.

When you stop a service one of the things which has to occur is that the lock file must be deleted.

Exercises

12.10   What would happen if you tried to stop a service when you were logged in as a normal user (i.e. not root)?  Try it.

Why won't it boot?

There will be times when you have to reboot your machine in a nasty manner. One rule of thumb used by Systems Administration to solve some problems is "When in doubt, turn the power off, count to ten slowly, and turn the power back on". There will be times when the system won't come back to you, DON'T PANIC!

Possible reasons why the system won't reboot include

§         hardware problems,
Caused by both hardware failure and problems caused by human error (e.g. the power cord isn't plugged in, the drive cable is the wrong way around)

§         defective boot floppies, drives or tapes,

§         damaged file systems,

§         improperly configured kernels,
A kernel configured to use SCSI drives won't boot on a system that uses an IDE drive controller.

§         errors in the rc scripts or the /etc/inittab file.

Solutions

The following is a Systems Administration maxim

Always keep a separate working method for booting the machine into at least single user mode.

This method might be a boot floppy, CD-ROM or tape. The format doesn't matter. What does matter that at anytime you can bring the system up in at least single user mode so you can perform some repairs.

A separate mechanism to bring the system up single user mode will enable you to solve most problems involved with damaged file systems, improperly configured kernels and errors in the rc scripts.

Boot and root disk s

The concept of boot and root disk are important to understanding how the booting process works and also in creating an alternative boot method for your system.  The definitions used are

§         boot disk
This is the disk which contains the kernel of your system. 

§         root disk
The root disk contains the root file system with all the necessary programs and files required for init to start and setup a minimum of services.  This includes such things as init, /etc/inittab and associated files, /etc/passwd and other information required to allow people to login plus a whole lot more.

To have a complete alternative boot method you must have both alternative boot and root disks.  The alternative boot disk is useful if you have problems with your kernel.  The alternative root disk is required when you have problems such as a wrongly configured inittab or a missing /etc/passwd file.

It is possible for a single disk to provide both boot and root disk services.

Making a boot and root disk

It is important that you have alternative boot and root disks for your system.  There are (at least) two methods you can use to obtain them

§         use the installation disks which come with your distribution of Linux,
In order to install Linux you basically have to have a functioning Linux computer.  Therefore the installation disk(s) that you used to install Linux provide an alternative boot and root disk.

§         use a rescue disk (set).
A number of people have created rescue disks.  These are boot and root disk sets which have been configured to provide you with the tools you will need to rescue your system from problems.

The resource materials section for week 7 on the 85321 Web site/CD-ROM contains pointers to two rescue disk sets.

Exercises

12.11   Create a boot and root disk set for your system using the resources on the 85321 Web site/CD-ROM.


Using boot and root

What do you think would happen if you did the following?

rm /etc/inittab

The next time you booted your system you would see something like this on the screen.

INIT: version 2.71 booting
INIT: No inittab file found

Enter runlevel: 1
INIT: Entering runlevel: 1
INIT: no more processes left in this runlevel

What's happening here is that init can't find the inittab file and so it can't do anything.  To solve this you need to boot the system and replace the missing inittab file.  This is where the alternative root and boot disk(s) come in handy.

To solve this problem you would do the following

§         boot the system with the alternative boot/root disk set

§         login as root

§         perform the following

bash:/> mount –t ext2 /dev/hda2 /mnt
mount: mount point /mnt does not exist
bash:/> mkdir /mnt
bash:/> mount –t ext2 /dev/hda1 /mnt
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
bash:/> cp /etc/inittab /mnt/etc/inittab
bash:/> umount /mnt

A description of the above goes like this

§         Try to mount the usual root file system, the one with the missing inittab file.  But it doesn't work.

§         Create the missing /mnt directory.

§         Now mount the usual root file system.

§         Copy the inittab file from the alternative root disk onto the usual root disk.  Normally you would have a backup tape which contains a copy of the old inittab file.

§         Unmount the usual root file system and reboot the system.

The aim of this example is to show you how you can use alternative root and boot disks to solve problems which may prevent your system from booting.

Exercises

12.12   Removing the /etc/inittab file from your Linux system will not only cause problems when you reboot the machine.  It also causes problems when you try to shut the machine down.  What problems?  Why?

12.13   What happens if you forget the root password?  Without it you can't perform any management tasks at all.  How would you fix this problem?

12.14   Boot your system in the normal manner and comment out all the entries in your /etc/inittab file that contain the word mingetty. What do you think is going to happen? Reboot your system. Now fix the problem using the installation floppy disks.

Solutions to hardware problems

Some guidelines to solving hardware problems

§         check the power supply and its connections,
Don't laugh, there are many cases I know of in which the whole problem was caused by the equipment not being plugged in properly or not at all.

§         check the cables and plugs on the devices,

§         check any fault lights on the hardware,

§         power cycle the equipment (power off, power on),
Again I'll mention that old Systems Administration maxim. If something doesn't work turn it off, count to 10 very slowly and turn it back on again (usually with the fingers crossed). Not only can it solve problems but it is also a good way of relaxing.

§         try rebooting the system without selected pieces of hardware,
It may be only one faulty device that is causing the problem. Try isolating the problem device.

§         use any diagnostic programs that are available, or as a last resort

§         call a technician or a vendor.

Damaged file systems

In the next two chapters we'll examine file systems in detail and provide solutions to how you can fix damaged file systems. The two methods we'll examine include

§         the fsck command, and

§         always maintaining good backups.

Improperly configured kernels

The kernel contains most of the code that allows the software to talk to your hardware. If the code it contains is wrong then your software won't be able to talk to your hardware. In a later chapter on the kernel we'll explain in more detail why you might want to change the kernel and why it might not work.

Suffice to say you must always maintain a working kernel that you can boot your system with.


Shutting down

You should not just simply turn a UNIX computer off or reboot it. Doing so will usually cause some sort of damage to the system especially to the file system. Most of the time the operating system may be able to recover from such a situation (but NOT always).

There are a number of tasks that have to be performed for a UNIX system to be shutdown cleanly

§         tell the users the system is going down,
Telling them 5 seconds before pulling the plug is not a good way of promoting good feeling amongst your users. Wherever possible the users should know at least a couple of days in advance that the system is going down (there is always one user who never knows about it and complains).

§         signal the currently executing processes that it is time for them to die,
UNIX is a multi-tasking operating system. Just because there is no-one logged in this does not mean that there is nothing going on. You must signal all the current running processes that it is time to die gracefully.

§         place the system into single user mode, and

§         perform sync to flush the file systems buffers so that the physical state of the file system matches the logical state.

Most UNIX systems provide commands that perform these steps for you.

Reasons Shutting down

In general, you should try to limit the number of times you turn a computer on or off as doing so involves some wear and tear. It is often better to simply leave the computer on 24 hours a day. In the case of a UNIX system being used for a mission critical application by some business it may have to be up 24 hours a day.

Some of the reasons why you may wish to shut a UNIX system down include

§         general housekeeping,
Every time you reboot a UNIX computer it will perform some important housekeeping tasks, including deleting files from the temporary directories and performing checks on the machines file systems. Rebooting will also get rid of any zombie processes.

§         general failures, and
Occasionally problems will arise for which there is only one resort, shutdown. These problems can include hanging logins, unsuccessful mount requests, dazed devices, runaway processes filling up disk space or CPU time and preventing any useful work being done.

§         system maintenance and additions.
There are some operations that only work if the system is rebooted or if the system is in single user mode, for example adding a new device.


Being nice to the users

Knowing of the existence of the appropriate command is the first step in bringing your UNIX computer down. The other step is outlined in the heading for this section. The following command is an example of what not to do.

shutdown -h -1 now

Under Linux this results in a message somewhat like this appearing on every user's terminal

THE SYSTEM IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged.

and the user will almost immediately be logged out.

This is not a method inclined to win friends and influence people. The following is a list of guidelines of how and when to perform system shutdowns

§         shutdowns should be scheduled,
If users know the system is coming down at specified times they can organise their computer time around those times.

§         perform a regular shutdown once a week, and
A guideline, so that the housekeeping tasks discussed above can be performed. If it's regular the users get to know when the system will be going down.

§         use /etc/motd .
/etc/motd is a text file that contains the message the users see when they first log onto a system. You can use it to inform users of the next scheduled shutdown.

Commands to shutdown

There are a number of different methods for shutting down and rebooting a system including

§         the shutdown command
The most used method for shutting the system down. The command can display messages at preset intervals warning the users that the system is coming down.

§         the halt command
Logs the shutdown, kills the system processes, executes sync and halts the processor.

§         the reboot command
Similar to halt but causes the machine to reboot rather than halting.

§         sending init a TERM signal,
init will usually interpret a TERM signal (signal number 15) as a command to go into single user mode. It will kill of user processes and daemons. The command is kill -15 1 (init is always process number 1). It may not work or be safe on all machines.

§         the fasthalt or fastboot commands
These commands create a file /fastboot before calling halt or reboot. When the system reboots and the startup scripts find a file /fastboot they will not perform a fsck on the file systems.

The most used method will normally be the shutdown command. It provides users with warnings and is the safest method to use.

shutdown

The format of the command is

shutdown [ -h | -r ] [ -fqs ] [ now | hh:ss | +mins ]

The parameters are

§         -h
Halt the system and don't reboot.

§         -r
Reboot the system

§         -f
Do a fast boot.

§         -q
Use a default broadcast message.

§         -s
Reboot into single user mode by creating a /etc/singleboot file.

The time at which a shutdown should occur are specified by the now hh:ss +mins options.

§         now
Shut down immediately.

§         hh:ss
Shut down at time hh:ss.

§         +mins
Shut down mins minutes in the future.

The default wait time before shutting down is two minutes.

What happens

The procedure for shutdown is as follows

§         five minutes before shutdown or straight away if shutdown is in less than five minutes
The file /etc/nologin is created. This prevents any users (except root) from logging in. A message is also broadcast to all logged in users notifying them of the imminent shutdown.

§         at shutdown time.
All users are notified. init is told not to spawn any more getty processes. Shutdown time is written into the file /var/log/wtmp. All other processes are killed. A sync is performed. All file systems are unmounted. Another sync is performed and the system is rebooted.


The other commands

The other related commands including reboot, fastboot, halt, fasthalt all use a similar format to the shutdown command. Refer to the man pages for more information.

Conclusions

Booting and shutting down a UNIX computer is significantly more complex than performing the same tasks with a MS-DOS computer. A UNIX computer should never just be shut off.

The UNIX boot process can be summarised into a number of steps

§         the hardware ROM or BIOS performs a number of tasks including loading the bootstrap program,

§         the bootstrap program loads the kernel,

§         the kernel starts operation, configures the system and runs the init process

§         init consults the /etc/inittab file and performs a number of necessary actions.

One of the responsibilities of the init process is to execute the startup scripts that, under Linux, reside in the /etc/rc.d directory.

It is important that you have at least one other alternative method for booting your UNIX computer.

There are a number of methods for shutting down a UNIX computer.  The most used is the shutdown command.

Review Questions

12.1

What would happen if the file /etc/inittab did not exist? Find out.

12.2

How would you fix the following problems?

§         The kernel for your Linux computer has been accidentally deleted.

§         The /etc/fstab file for your system has been moved to /usr/local/etc/fstab.

12.3

Explain each of the following inittab entries

§         s1:45:respawn:/sbin/agetty 19200 ttyS0 vt100

§         id:5:initdefault:

§         si:S:sysinit:/etc/rc.d/rc.S