Chapter 13

Kernel

The bit of the nut that you eat?

Well, not exactly. The kernel is the core of the operating system; it is the program that controls the basic services that are utilised by user programs; it is this suite of basic services in the form of system calls that make an operating system "UNIX".

The kernel is also responsible for:

§         CPU resource scheduling (with the associated duties of process management)

§         Memory management (including the important implementation of protection)

§         Device control (including providing the device-file/device-driver interface)

§         Security (at a device, process and user level)

§         Accounting services (including CPU usage and disk quotas)

§         Inter Process Communication (shared memory, semaphores and message passing)

The Linux Kernel FAQ sums it up nicely with:

The Unix kernel acts as a mediator for your programs. First, it does  the memory management for all of the running programs (processes), and makes sure that they all get a fair (or unfair, if you please) share of the processor's cycles. In addition, it provides a nice, fairly portable interface for programs to talk to your hardware.

Obviously, there is more to the kernel's operation than this, but the basic functions above are the most important to know.

Why?

Why study the kernel? Isn't that an operating-system-type-thing? What does a Systems Administrator have to do with the internal mechanics of the OS?

Lots.

UNIX is usually provided with the source for the kernel (there are exceptions to this in the commercial UNIX world). The reason is that this allows Systems Administrators to directly customise the kernel for their particular system. A Systems Administrator might do this because:

§         They have modified the system hardware (adding devices, memory, processors etc.).

§         They wish to optimise the memory usage (called reducing the kernel footprint).

§         The speed and performance of the system may need improvement (eg. modify the quantum per task to suit CPU intensive vs IO intensive systems). This process (along with optimising memory) is called tweaking.

§         Improvements to the kernel can be provided in the form of source code which then allows the Systems Administrator to easily upgrade the system with a kernel recompile.

Recompiling the kernel is the process whereby the kernel is reconfigured, the source code is regenerated/recompiled and a linked object is produced. Throughout this chapter the concept of recompiling the kernel will mean both the kernel source code compilation and linkage. 

How?

In this chapter, we will be going through the step-by-step process of compiling a kernel, a process that includes:

Finding out about your current kernel (what version it is and where it is located?)

Obtaining the kernel (where do you get the kernel source, how do you unpack it and where do you put it?)

Obtaining and reading documentation (where can I find out about my new kernel source?)

Configuring your kernel (how is this done, what is this doing?)

Compiling your kernel (how do we do this?)

Testing the kernel (why do we do this and how?)

Installing the kernel (how do we do this?)

But to begin with, we really need to look at exactly what the kernel physically is and how it is generated.

To do this, we will examine the Linux kernel, specifically on the x86 architecture.

The lifeless image

The kernel is physically a file that is usually located in the /boot directory. Under Linux, this file is called vmlinuz . On my system, an ls listing of the kernel produced:

bash# ls -al  /boot/vml*
lrwxrwxrwx   1 root     root           14 Jan  2 23:44 /boot/vmlinuz -> vmlinuz-2.0.31
-rw-r--r--   1 root     root       444595 Nov 10 02:59 /boot/vmlinuz-2.0.31

You can see in this instance that the “kernel file” is actually a link to another file containing the kernel image.  The actual kernel size will vary from machine to machine. The reason for this is that the size of the kernel is dependant on what features you have compiled into it, what modifications you've make to the kernel data structures and what (if any) additions you have made to the kernel code.

vmlinuz is referred to as the kernel image. At a physical level, this file consists of a small section of machine code followed by a compressed block. At boot time, the program at the start of the kernel is loaded into memory at which point it uncompresses the rest of the kernel.

This is an ingenious way of making the physical kernel image on disk as small as possible; uncompressed the kernel image could be around one megabyte.

So what makes up this kernel?

Kernel gizzards

An umcompressed kernel is really a giant object file; the product of C and assembler linking - the kernel is not an "executable" file (i.e. you just can't type vmlinuz at the prompt to run the kernel). The actual source of the kernel is stored in the /usr/src/linux directory; a typical listing may produce:

[jamiesob@pug jamiesob]$ ls -al /usr/src
total 4
drwxr-xr-x   4 root     root         1024 Jan  2 23:53 .
drwxr-xr-x  18 root     root         1024 Jan  2 23:45 ..
lrwxrwxrwx   1 root     root           12 Jan  2 23:44 linux -> linux-2.0.31
drwxr-xr-x   3 root     root         1024 Jan  2 23:44 linux-2.0.31
drwxr-xr-x   7 root     root         1024 Jan  2 23:53 redhat

/usr/src/linux is a soft link to /usr/src/<whatever linux version> -  this means you can store several kernel source trees - however - you MUST change the soft link of /usr/src/linux to the version of the kernel you will be compiling as there are several components of the kernel source that rely on this. 

SPECIAL NOTE: If your system doesn't have a /usr/src/linux or a /usr/src/linux* directory (where * is the version of the Linux source) then you don't have the source code installed on your machine. We will be discussing in a later section exactly how you can obtain the kernel source. To obtain and install the source from the Red Hat CD-ROM, you must complete the following steps:

Mount RedHat CD 1 under /mnt.

Execute (as root) the following commands:

rpm –ivh /mnt/RedHat/RPMS/kernel-headers-2.0.31-7.i386.rpm
rpm –ivh /mnt/RedHat/RPMS/kernel-source-2.0.31-7.i386.rpm

The source has now been installed.  For further information on installing RedHat components, see Chapter 8 of the RedHat Installation Guide.

A typical listing of /usr/src/linux produces:

-rw-r--r--   1 root     root            2 May 12  1996 .version
-rw-r--r--   1 root     root         6282 Aug  9  1994 CHANGES
-rw-r--r--   1 root     root        18458 Dec  1  1993 COPYING
-rw-r--r--   1 root     root        21861 Aug 17  1995 CREDITS
-rw-r--r--   1 root     root         3221 Dec 30  1994 Configure
-rw-r--r--   1 root     root         2869 Jan 10  1995 MAGIC
-rw-r--r--   1 root     root         7042 Aug 17  1995 Makefile
-rw-r--r--   1 root     root         9817 Aug 17  1995 README
-rw-r--r--   1 root     root         3114 Aug 17  1995 README.modules
-rw-r--r--   1 root     root        89712 May 12  1996 System.map
drwxr-xr-x   6 root     root         1024 May 10  1996 arch/
drwxr-xr-x   7 root     root         1024 May 10  1996 drivers/
drwxr-xr-x  13 root     root         1024 May 12  1996 fs/
drwxr-xr-x   9 root     root         1024 May 12  1996 include/
drwxr-xr-x   2 root     root         1024 May 12  1996 init/
drwxr-xr-x   2 root     root         1024 May 12  1996 ipc/
drwxr-xr-x   2 root     root         1024 May 12  1996 kernel/
drwxr-xr-x   2 root     root         1024 May 12  1996 lib/
drwxr-xr-x   2 root     root         1024 May 12  1996 mm/
drwxr-xr-x   2 root     root         1024 Jan 23  1995 modules/
drwxr-xr-x   4 root     root         1024 May 12  1996 net/
-rw-r--r--   1 root     root          862 Aug 17  1995 versions.mk
-rwxr-xr-x   1 root     root       995060 May 12  1996 vmlinux

Take note of the vmlinux (if you have one) file - this is the uncompressed kernel! Notice the size? [vmlinuz is the .z (or compressed) version of vmlinux plus the decompression code]

Within this directory hierarchy are in excess of 1300 files and directories. On my system this consists of around 400 C source code files, 370 C header files, 40 Assembler source files and 46 Makefiles. These, when compiled, produce around 300 object files and libraries. At a rough estimate, this consumes around 16 megabytes of space (this figure will vary).

While this may seem like quite a bit of code, much of it actually isn't used in the kernel. Quite a large portion of this is driver code; only drivers that are needed on the system are compiled into the kernel, and then only those that are required at run time (the rest can be placed separately in things called modules; we will examine this later).

The various directories form logical divisions of the code, especially between the architecture dependant code (linux/arch), drivers (linux/drivers) and architecture independent code. By using grep and find, it is possible to trace the structure of the kernel program, look at the boot process and find out how various parts of it work.

The first incision

An obvious place to start with any large C program is the void main(void) function. If you grep every source file in the Linux source hierarchy for this function name, you will be sadly disappointed.

As I pointed out earlier, the kernel is a giant object file - a series of compiled functions. It is NOT executable. The purpose of void main(void) in C is to establish a framework for the linker to insert code that is used by the operating system to load and run the program. This wouldn't be of any use for a kernel - it is the operating system!

This poses a difficulty - how does an operating system run itself?

Making the heart beat...

In the case of Linux, the following steps are performed to boot the kernel:

§         The boot loader program (e.g. lilo) starts by loading the vmlinuz from disk into memory, then starts the code executing.

§         After the kernel image is decompressed, the actual kernel is started. This part of the code was produced from assembler source; it is totally machine specific. The code for this is located in the /usr/src/linux/arch/i386/kernel/head.S file. Technically at this point the kernel is running. This is the first process (0) and is called swapper. Swapper does some low level checks on the processor, memory and FPU availability, then places the system into protected mode. Paging is enabled.

§         Interrupts are disabled (every one) though the interrupt table is set up for later use. The entire kernel is realigned in memory (post paging) and some of the basic memory management structures are created.

§         At this point, a function called start_kernel is called. start_kernel is physically located in /usr/src/linux/init/main.c and is really the core kernel function - really the equivalent of the void main(void).   main.c itself is virtually the root file for all other source and header files.

§         Tests are run (the FPU bug in Pentium chip is identified amongst other checks including examinations on the DMA chip and bus architecture) and the BogoMip setting is established.

§         start_kernel sets up the memory, interrupts and scheduling. In effect, the kernel has now has multi-tasking enabled. The console already has had several messages displayed to it.

§         The kernel command line options are parsed (those passed in by the boot loader) and all embedded device driver modules are initialised.

§         Further memory initialisations occur, socket/networking is started and further bug checks are performed.

§         The final action performed by swapper is the first process creation with fork whereby the init program is launched. Swapper now enters an infinite idle loop.

It is interesting to note that as a linear program, the kernel has finished running! The timer interrupts are now set so that the scheduler can step in and pre-empt the running process. However, sections of the kernel will be periodically executed by other processes.

This is really a huge oversimplification of the kernel's structure, but it does give you the general idea of what it is, what it is made up of and how it loads.


Modules

A recent innovation in kernel design is the concept of modules. A module is a dynamically loadable object file containing functions for interfacing with a particular device or performing particular tasks. The concept behind modules is simple; to make a kernel smaller (in memory), keep only the bare basics compiled into the kernel. When the kernel needs to use devices, let it load modules into memory. If it doesn't use the modules, let them be unloaded from memory.

This concept has also revolutionised the way in which kernels are compiled. No longer do you need to compile every device driver into the kernel; you can simply mark some as modules. This also allows for separate module compilation - if a new device driver is released then it is a simple case of recompiling the module instead of the entire kernel.

Modules work by the kernel communicating with a program called kerneld. kerneld is run at boot time just like a normal daemon process. When the kernel notices that a request has come in for the use of a module, it checks if it is loaded in memory. If it is, then the routine is run, however, if not, the kernel gets kerneld to load the module into memory. kerneld also removes the module from memory if it hasn't been used in a certain period of time (configurable).

 The concept of modules is a good one, but there are some things you should be aware of:

§         Frequently used devices and devices required in the boot process (like the hard disk) should not be used as modules; these must be compiled into the kernel.

§         While the concept of modules is great for systems with limited memory, should you use them? Memory is cheap - compiling an object into the kernel rather than leaving it as a module may use more memory but is that better than a system that uses its CPU and IO resources to constantly load and unload modules? There are trade offs between smaller kernels and CPU/IO usage with loadable modules.

§         It is probably a good idea to modularise devices like the floppy disk, CD-ROM and parallel port - these are not used very often, and when they are, only for a short time.

§         It is NOT a good idea to modularise frequently used modules like those which control networking.

There is quite a bit more to kernel modules.

Reading

 

The Resource Materials section, on the 85321 Website/CD-ROM, for week 7 contains pointers to a number of documents with information about Linux kernel modules.


The /proc file system

Part of the kernel's function is to provide a file-based method of interaction with its internal data structures; it does this via the /proc virtual file system.

The /proc file system technically isn't a file system at all; it is in fact a window on the kernel's internal memory structures. Whenever you access the /proc file system, you are really accessing kernel memory.

So what does it do?

Effectively the /proc file system is providing an instant snapshot of the status of the system. This includes memory, CPU resources, network statistics and device information. This data can be used by programs to gather information about a system, an example of which is the top program. top scans through the /proc structures and is able to present the current memory, CPU and swap information, as given below:

  7:12pm  up  9:40,  1 user,  load average: 0.00, 0.00, 0.10
  34 processes: 33 sleeping, 1 running, 0 zombie, 0 stopped
  CPU states:  0.5% user,  0.9% system,  0.0% nice, 98.6% idle
  Mem:  14940K av, 13736K used,  1204K free,  5172K shrd,  1920K buff
  Swap: 18140K av,  2304K used, 15836K free

  PID USER     PRI  NI SIZE  RES SHRD STAT %CPU %MEM  TIME COMMAND
  789 jamiesob  19   0  102  480  484 R     1.1  3.2  0:01 top
   98 root      14   0 1723 2616  660 S     0.3 17.5 32:30 X :0
    1 root       1   0   56   56  212 S     0.0  0.3  0:00 init [5]
   84 jamiesob   1   0  125  316  436 S     0.0  2.1  0:00 -bash
   96 jamiesob   1   0   81  172  312 S     0.0  1.1  0:00 sh /usr/X11/bin/star
   45 root       1   0   45  232  328 S     0.0  1.5  0:00 /usr/sbin/crond -l10
    6 root       1   0   27   72  256 S     0.0  0.4  0:00 (update)
    7 root       1   0   27  112  284 S     0.0  0.7  0:00 update (bdflush)
   59 root       1   0   53  176  272 S     0.0  1.1  0:00 /usr/sbin/syslogd
   61 root       1   0   40  144  264 S     0.0  0.9  0:00 /usr/sbin/klogd
   63 bin        1   0   60    0  188 SW    0.0  0.0  0:00 (rpc.portmap)
   65 root       1   0   58    0  180 SW    0.0  0.0  0:00 (inetd)
   67 root       1   0   31    0  180 SW    0.0  0.0  0:00 (lpd)
   73 root       1   0   84    0  208 SW    0.0  0.0  0:00 (rpc.nfsd)
   77 root       1   0  107  220  296 S     0.0  1.4  0:00 sendmail:accepting

The actual contents of the /proc file system on my system look like:

psyche:~$ ls /proc
1/           339/         7/           87/          dma          modules
100/         45/          71/          88/          filesystems  net/
105/         451/         73/          89/          interrupts   pci
108/         59/          77/          90/          ioports      self/
109/         6/           793/         96/          kcore        stat
116/         61/          80/          97/          kmsg         uptime
117/         63/          84/          98/          ksyms        version
124/         65/          85/          cpuinfo      loadavg
338/         67/          86/          devices      meminfo

Each of the numbered directories store state information of the process by their PID. The self/ directory contains information for the process that is viewing the /proc filesystem, i.e. - YOU. The information stored in this directory looks like:

cmdline                 (Current command line)
cwd - [0303]:132247    (Link to the current working directory)
environ                 (All environment variables)
exe - [0303]:109739    (Currently executing code)
fd/                     (Directory containing virtual links to 
                         file handles)
maps|                   (Memory map structure)
root - [0303]:2        (Link to root directory)
stat                    (Current process statistics)
statm                   (Current memory statistics)

Most of these files can be cat'ed to the screen. The /proc/filesystems file, when cat'ed, lists the supported file systems. The /proc/cpuinfo file gives information about the hardware of the system:

psyche:~$ cat /proc/cpuinfo
cpu             : 586
model           : Pentium 90/100
mask            : E
vid             : GenuineIntel
fdiv_bug        : no
math            : yes
hlt             : yes
wp              : yes
Integrated NPU  : yes
Enhanced VM86   : yes
IO Breakpoints  : yes
4MB Pages       : yes
TS Counters     : yes
Pentium MSR     : yes
Mach. Ch. Exep. : yes
CMPXCHGB8B      : yes
BogoMips        : 39.94

Be aware that upgrading the kernel may mean changes to the structure of the /proc file system. This may require software upgrades.  Information about this should be provided in the kernel README files.

Exercises

13.1      Find out where kerneld is launched from.

13.2      What is the purpose of /sbin/lsmod? Try it.

13.3      Find out where your kernel image is located and how large it is.

13.4      Examine the /proc file system on you computer. What do you think the /proc/kcore file is? Hint: Have a look at the size of the file.

Really, why bother?

The most common reason to recompile the kernel is because you've added some hardware and you want the kernel to recognise and (if you're lucky) use it. A very good time to recompile your kernel is after you've installed Linux. The reason for this is that the original Linux kernel provided has extra drivers compiled into it which consume memory. Funnily enough, while the kernel includes a driver for communicating in EBCDIC via a 300 baud modem to a coke machine sitting in the South Hungarian embassy in Cairo [Makefile Question:

Do you want to include support for coke machines located in Cairo? [Y],N,M? 
Do you want to support South Hungarian Embassy Models [Y],N,M? 
Support for 300 baud serial link [Y],N,M? 
Support EBCDIC communication[Y],N,M? 

(I might be making this up... :)]

 ...the kernel, by default, doesn't have support for some very common sound cards and network devices! To be fair, there are good reasons for this (IRQ conflicts etc.) but this does mean a kernel recompile is required.

Another good reason to modify the kernel is to customise some of its data structures for your system. Possible modifications include increasing the number of processes the kernel can support (this is a fixed array and can't be set on run time) or modifying the size of certain buffers.

One of the great benefits of having the source code for the operating system is that you can play OS-Engineer; it is possible for you to change the scheduling algorithm, memory management scheme or the IPC functionality.

While it might be nice to go and do these things, it would be unadvisable to modify the API if you want your programs to still run under Linux. However, there is nothing to stop you adding to the API. You may, for example, wish to add a system call to print "Hello World" to the screen (this would obviously be of great benefit to the rest of the Linux community ;) - this is possible for you to do. 

Strangely enough, to modify the kernel, you need kernel source code. The actual source can be obtained from a variety of locations. For users who installed Linux from CD ROM, the source can be found within the distribution. Typically you will actually go back into the installation menu and install only the section that contains the source.

However, more often than not, you are actually seeking to upgrade the kernel, so you need the latest kernel source. Because the development of the Linux kernel is an on-going process, new versions of development kernels are constantly being released. It is not unusual for development kernels to be released as often as once per day!

The Kernel HOWTO describes some ways to obtain kernels:

You can obtain the source via anonymous ftp from ftp.funet.fi in  /pub/OS/Linux/PEOPLE/Linus, a mirror, or other sites.  It is typically labeled linux-x.y.z.tar.gz, where x.y.z is the version number. Newer (better?) versions and the patches are typically in subdirectories such as V1.1' and V1.2' The highest number is the latest version, and is usually a `test release,'' meaning that if you feel uneasy about beta or alpha releases, you should stay with a major release.

I strongly suggest that you use a mirror ftp site instead of ftp.funet.fi. Here is a short list of mirrors and other sites:

  USA:            tsx-11.mit.edu:/pub/linux/sources/system
  USA:            sunsite.unc.edu:/pub/Linux/kernel
  UK:             unix.hensa.ac.uk:/pub/linux/kernel
  Austria:        fvkma.tu-graz.ac.at:/pub/linux/linus
Germany:       ftp.Germany.EU.net:/pub/os/Linux/Local.EUnet/Kernel/Linus
  Germany:        ftp.dfv.rwth-aachen.de:/pub/linux/kernel
  France:         ftp.ibp.fr:/pub/linux/sources/system/patches     
  Australia:      kirk.bond.edu.au:/pub/OS/Linux/kernel

  If you do not have ftp access, a list of BBS systems which carry Linux is posted periodically to comp.os.linux.announce; try to obtain this.

Any Sunsite mirror will contain the latest versions of the Linux kernel. ftp://sunsite.anu.edu.au/linux is a good Australian site to obtain kernel sources.

Generally you will only want to obtain a "stable" kernel version, the n.n.0 releases are usually safe though you can find out what is the current stable kernel release by reading the README* or LATEST* files in the download directory.

If you have an extremely new type of hardware then you are often forced into using developmental kernels. There is nothing wrong with using these kernels, but beware that you may encounter system crashes and potential losses of data. During a one year period, the author obtained around twenty developmental kernels, installed them and had very few problems. For critical systems, it is better to stick to known stable kernels. 

So, you've obtained the kernel source - it will be in one large, compressed file. The following extract from the Linux HOWTO pretty much sums up the process:

Log in as or su to root, and cd to /usr/src.  If you installed kernel source when you first installed Linux (as most do), there will already be a directory called Linux there, which contains the entire old source tree.  If you have the disk space and you want to play it safe, preserve that directory. A good idea is to figure out what version your system runs now and rename the directory accordingly. The command 

        uname -r 
  
prints the current kernel version.  Therefore, if

        uname -r 

said 1.47, you would rename (with mv) Linux to linux-1.1.47.  If you feel mildly reckless, just wipe out the entire directory. In any case, make certain there is no Linux directory in /usr/src before unpacking the full source code.


Now, in /usr/src, unpack the source with 

        tar zxvf linux-x.y.z.tar.gz
  
 (if you've just got a .tar  file with no .gz at the end, tar xvf  linux-x.y.z.tar works.).  The contents of the source will fly by. When finished, there will be a new Linux directory in /usr/src. cd to linux and look over the README  file.  There will be a section with the label INSTALLING the kernel.

A couple of points to note.

§         Some sources install to directories given by the kernel version, not to the linux directory. It may be worth checking on this before you unpack the source by issuing the following command.  It will list all the files and directories that are contained in the source_filename, the kernel archive.
tar -txvf source_filename
This will display a list of files and where they are to be installed. If they are to be installed into a directory other than linux then you must make a symbolic link, called linux in the /usr/src directory to the directory that contains the new source.

§         NEVER just delete your old source - you may need it to recompile your old kernel version if you find the new version isn't working out, though we will discuss other ways round this problem in later sections.

If you are upgrading your kernel regularly, an alternative to constantly obtaining the complete kernel source is to patch your kernel.

Patches are basically text files that contain a list of differences between two files. A kernel patch is a file that contains the differences between all files in one version of the kernel to the next.

Why would you use them? The only real reason is to reduce download time and space. A compressed kernel source can be extremely large whereas patches are relatively small.

Patches are produced as the output from the diff command. For example, given two files:

file1

"vi is a highly exciting program with a wide range of great features – I am sure that we will adopt it as part of our PlayPen suite"
        - Anonymous Multimillionaire Software Farmer

file2

"vi is a mildly useless program with a wide range of missing features – I am sure that we will write a much better product; we'll call it `Sentence'"
        - Anonymous Multimillionaire Software Farmer

After executing the command:

diff file1 file2 > file3

file3 would contain:

1,2c1,2
< "vi is a highly exciting program with a wide range of great features - I
< am sure that we will adopt it as part of our PlayPen suite"
---
"vi is a mildly useless program with a wide range of missing features - I
am sure that we will write a much better product; we'll call it `Sentence'"

To apply a patch, you use the patch command. patch expects a file as a parameter to apply the patch to, with the actual patch file as standard input. Following the previous example, to patch file1 with file3 to obtain file2, we'd use the following command:

patch file1 < file3

This command applies the file3 patch to file1. After the command, file1 is the same as file2 and a file called file1.orig has been created as a backup of the original file1.

The Linux HOWTO further explains applying a kernel patch:

  Incremental upgrades of the kernel are distributed as patches. For
  example, if you have version 1.1.45, and you notice that there's a
  patch46.gz out there for it, it means you can upgrade to version
  1.1.46 through application of the patch. You might want to make a
  backup of the source tree first (tar zcvf old-tree.tar.gz linux will 
  make a compressed tar archive for you).


  So, continuing with the example above, let's suppose that you have
  patch46.gz in /usr/src. cd to /usr/src  and do:

        zcat patch46.gz | patch -p0 

  (or patch -p0 < patch46 if the patch isn't compressed).

  You'll see things whizz by (or flutter by, if your system is that
  slow) telling you that it is trying to apply hunks, and whether it
  succeeds or not. Usually, this action goes by too quickly for you to
  read, and you're not too sure whether it worked or not, so you might
  want to use the -s flag to patch, which tells patch to only report
  error messages (you don't get as much of the `hey, my computer is
  actually doing something for a change!' feeling, but you may prefer
  this..). To look for parts which might not have gone smoothly, cd to
  /usr/src/linux  and look for files with a .rej extension. Some
  versions of patch (older versions which may have been compiled with on
  an inferior file system) leave the rejects with a # extension. You can  
  use find to look for you;

        find .  -name '*.rej' -print


  prints all files who live in the current directory or any subdirecto-
  ries with a .rej extension to the standard output.
 

Patches can be obtained from the same sites as the complete kernel sources.

A couple of notes about patches:

§         For every new version of the kernel, there is a patch. To upgrade from a kernel version that is five versions behind the version you want, yo have to obtain and apply five patches (e.g. kernel n.n.1 upgrading to n.n.6 requires patches: patch2, patch3, patch4, patch5 and patch6). This gets tedious and is often easier and quicker to simply obtain the entire kernel source again.

§         Patches are forever - when you patch your kernel source, you modify it for good.

Every version of the kernel source comes with documentation. There are several "main" files you should read about your current source version including:

§         /usr/src/linux/README
Instructions on how to compile the kernel.

§         /usr/src/linux/MAINTAINERS
A list of people who maintain the code.

§         /usr/src/linux/Documentation/*
Documentation for parts of the kernel.

ALWAYS read the documentation after obtaining the source code for a new kernel, and especially if you are going to be compiling in a new kind of device. The Linux Kernel-HOWTO is essential reading for anything relating to compiling or modifying the kernel.

Linux is the collaborative product of many people. This is something you quickly discover when examining the source code. The code (in general) is neat but sparsely commented; those comments that do exist can be absolutely riotous...well, at least strange :)

These are just a selection of the quotes found in the /usr/src/linux/kernel directory:

(fork.c)

        Fork is rather simple, once you get the hang of it, but the memory
        management can be a bitch.

(exit.c)

        "I ask you, have you ever known what it is to be an orphan?"       

(module.c)

        ... This feature will give you ample opportunities to get to know
        the taste of your foot when you stuff it into your mouth!!!

(schedule.c)

        The "confuse_gcc" goto is used only to get better assembly code..
        Dijkstra probably hates me.       

        To understand this, you have to know who Dijkstra was - remember OS?

        ... disregard lost ticks for now.. We don't care enough.

(sys.c)

        OK, we have probably got enough memory - let it rip.   

        This needs some heave checking ...
        I just haven't get the stomach for it. I also don't fully
        understand. Let somebody who does explain it.

(time.c)

        This is ugly, but preferable to the alternatives.  Bad, bad....     

        ...This is revolting.

Apart from providing light entertainment, the kernel source comments are an important guide into the (often obscure) workings of the kernel.

The main reason for recompiling the kernel is to include support for new devices - to do this you simple have to go through the compile process and answer "Yes" to a few questions relating to the hardware you want. However, in some cases you may actually want to modify the way in which the kernel works, or, more likely, one of the data structures the kernel uses. This might sound a bit daunting, but with Linux this is a relatively simple process.

For example, the kernel maintains a statically-allocated array for holding a list of structures associated with each process running on the system. When all of these structures are used, the system is unable to start any new processes. This limit is defined within the tasks.h file located in /usr/src/linux/include/linux/ in the form of:

/*
* This is the maximum nr of tasks - change it if you need to
*/
#define NR_TASKS        512
#define MAX_TASKS_PER_USER (NR_TASKS/2)
#define MIN_TASKS_LEFT_FOR_ROOT 4

While 512 tasks may seem a lot, on a multiuser system this limit is quickly exhausted. Remember that even without a single user logged on, a Linux system is running between 30 and 50 tasks. For each user login, you can (at peak periods) easily exceed 5 processes per user. Adding this to web server activity (some servers can be running in excess of one hundred processes devoted to processing incoming http requests), mail server, telnet, ftp and other network services, the 512 process limit is quickly reached. 

Increasing NR_TASKS and recompiling the kernel will allow more processes to be run on the system - the downside to this is that more memory will be allocated to the kernel data area in the form of the increased number of task structures (leaving less memory for user programs).

Other areas you may wish to modify include buffer sizes, numbers of virtual terminals and memory structures. Most of these should be modifiable from the .h files found in the kernel source "include" directories.

There are, of course, those masochists (like myself) who can't help tinkering with the kernel code and "changing" things (a euphemism for wrecking a nice stable kernel). This isn't a bad thing (there is an entire team of kernel developers world-wide who spend quite a bit of time doing this) but you've got to be aware of the consequences - total system annihilation is one. However, if you feel confident in modifying kernel code, perhaps you should take a quick look at: /usr/src/linux/kernel/sched.c or /usr/src/linux/mm/memory.c 

(actually, look at the code anyway). These are two of the most important files in the kernel source, the first, sched.c is responsible for task scheduling. The second, memory.c  is responsible for memory allocation. Perhaps someone would like to modify memory.c so that when the kernel runs out of memory that the system simply doesn't just "hang" (just one of my personal gripes there... ;)

As we will discuss in the next section, ALL changes to the kernel should be compiled and tested on DISK before the "new" kernel is installed on the system. The following section will explain how this is done.

§         Obtain the source of the version before the latest kernel. Install the source in the appropriate directory.

§         Obtain the patch for the latest kernel source and apply it to the source files you previously retrieved.

If you don't have Internet access, do the same thing but using the CD-ROM. Pick a version of the kernel source, install it, then patch it with the patch for the next version  

Find out how to generate a patch file based on the differences between more than one file - what is the command that would recursively generate a patch file from two directories? (These puns are getting very sad)

As you are aware (because you've read all the previous chapters and have been paying intense attention), make is a program use to compile source files, generate object files and link them. make actually lets the compilers do the work, however it co-ordinates things and takes care of dependencies. Important tip: Dependencies are conditions that exist due to that fact some actions have to be done after other actions - this is confusing, but wait, it gets worse. Dependencies also relate to the object of the action; in the case of make this relates to if the object (an object can be an object file or a source file) has been modified. For example, using our Humpty scenario:

humpty (program) is made up of legs, arms and torso (humpty, being an egg lacked a neck, thus his torso and head are one) - these could be equated to object files. Humpty's legs are made up of feet, shins and thighs - again, object files. Humpty's feet are made up of toes and other bits (how do you describe an egg's foot???) - these could be equated to source files. To construct humpty, you'd start at the simplest bits, like toes, and combine them with other bits to for the feet, then the legs, then finally, humpty.

You could not, however, fully assemble the leg without assembling the foot. And if you modified Humpty's toes, it doesn't mean you'd have to recompile his fingers - you'd have to reconstruct the foot object, relink into a new leg object, which you'd link with the (pre compiled and unmodified) arms and torso objects - thus forming Humpty.

make, while not specifically designed to handle broken egg reconstruction, does the same thing with source files - based entirely of rules which the user defines within a file called a Makefile. However, make is also clever enough to compile and link only the bits of a program that have been modified since the last compile.

In the case of the kernel, a series of Makefiles are responsible for the kernel construction. Apart from calling compilers and linkers, make can be used for running programs, and in the case of the kernel, one of the programs it calls is an initialisation script.

The steps to compile the kernel all make use of the make program. To compile the kernel, you must be in the /usr/src/linux, and issue (in the following order and as the root user) these commands:

make config or make menuconfig or make xconfig
make dep
make clean
make zImage or make zdisk
make zlilo (if the previous was make zImage)

If you are going to be using modules with your kernel, you will require the following two steps:

make modules
make modules_install

The following is an explanation of each step.

make config is the first phase of kernel recompilation. Essentially make config causes a series of questions to be issued to the user. These questions relate to what components should be compiled into the kernel. The following is a brief dialog from the first few questions prompted by make config:

psyche:~/usr/src/linux$ make config

rm -f include/asm
( cd include ; ln -sf asm-i386 asm)
/bin/sh scripts/Configure arch/i386/config.in
#
# Using defaults found in .config
#
*
* Code maturity level options
*
Prompt for development and/or incomplete code/drivers (CONFIG_EXPERIMENTAL)[N/y?] n
*
* Loadable module support       
*
Enable loadable module support (CONFIG_MODULES) [Y/n/?] Y
Set version information on all symbols for modules (CONFIG_MODVERSIONS)[N/y/?]
Kernel daemon support (e.g. autoload of modules) (CONFIG_KERNELD) [N/y/?] y
*
* General setup
*
Kernel math emulation (CONFIG_MATH_EMULATION) [Y/n/?]

A couple of points to note:

Each of these questions has an automatic default (capitalised). This default will be changed if you choose another option; i.e. If the default is "N" and you answer "Y" then on the next compile the default will be "Y". This means that you can simply press "enter" through most of the options after your first compile.

These first few questions relate to the basic kernel setup: note the questions regarding modules. This is important to answer correctly, as if you wish to include loadable module support, you must do so at this point.

As you progress further through the questions, you will be prompted for choosing support for specific devices, for example:

*
* Additional Block Devices
*
Loopback device support (CONFIG_BLK_DEV_LOOP) [N/y/m/?]
Multiple devices driver support (CONFIG_BLK_DEV_MD) [N/y/?]
RAM disk support (CONFIG_BLK_DEV_RAM) [Y/m/n/?]
Initial RAM disk (initrd) support (CONFIG_BLK_DEV_INITRD) [N/y/?]
XT harddisk support (CONFIG_BLK_DEV_XD) [N/y/m/?]

In this case, note the "m" option? This specifies that the support for a device should be compiled in as a module - in other words, not compiled into the kernel but into separate modules.

Be aware that there are quite a few questions to answer in make config. If at any point you break from the program, you must start over again. Some "sections" of make config, like the sound card section, save the results of the first make config in a configuration file; you will be prompted to either reconfigure the sound card options or use the existing configurations file.

There are two other methods of configuring the kernel, make menuconfig and make xconfig.

The first time you run either of these configuration programs, they will actually be compiled before your very eyes (exciting eh?). menuconfig is just a text based menu where you select the parts of the kernel you want; xconfig is the same thing, just for X-Windows. Using either of these utilities will probably be useful for someone who has never compiled the kernel before, however, for a comprehensive step-by-step selection of kernel components, make config is, in my view, better. You may be wondering what is the result of make config/menuconfig/xconfig? What is actually happening is that small configuration files are being generated to be used in the next step of the process, make dep.

make dep takes the results from make config and "sets up" which parts of the kernel have to be compiled and which don't. Basically this step involves extensive use of sed and awk for string substitution on files. This process may take a few minutes; there is no user interaction at this point.

After running make dep, make clean must be run. Again, this process requires no user interaction. make clean actually goes through the source tree and removes all the old object and temporary files. This process can not be skipped.

At this point, we are ready to start the compile process.

You have two options at this point; you may either install the kernel on the hard drive of the system and hope it works, or, install the kernel on a floppy disk and test it for a while, then (if it is working) install it on the hard drive.

ALWAYS tests your kernel on a floppy disk before installing it as your boot kernel on the hard drive. Why? Simply because if you install your new kernel directly over the one on the hard drive and it doesn't work properly (i.e.. crashes or hangs your system) then you will have difficulty booting your system (being a well prepared Systems Administrator, you'd have a boot disk of course ... ;).

To compile your new kernel to disk, you must issue the command:

make zdisk

This will install a bootable kernel on the disk in A:.  To boot the system, you simply insert the disk containing the kernel in A:, shut down the system, and let it reboot. The kernel on disk will load into memory, mount your root partition and the system will boot as normal. It is a good idea to run this kernel on disk for at least a few days, if not longer. If something goes wrong and you find your system has become unstable, it is merely a process of removing the disk, rebooting and the system will start up with your old kernel.

If you are going to install the kernel directly to the hard disk, then you should issue the commands:

make zImage
make zlilo

The first command, make zImage, actually compiles the kernel, the second, make zlilo installs the kernel on whatever root partition you have configured with lilo.

Most systems use lilo as the kernel boot loader. A common misconception is that lilo is only used to boot kernels off hard disks. This is actually incorrect; if lilo is configured (usually done when you installed your system, see "man lilo" for more information on configuring it) to boot the kernel from floppy disk, then running make zlilo will cause a copy of the kernel (and lilo) to be copied onto a disk. However, lilo is usually used to load a kernel form hard disk. The way it works is simple; lilo finds the absolute block/sector address of the kernel image on the disk. It then creates a small program (containing this and other information) and inserts it in the boot sector of the primary hard disk. At boot time, lilo is run, prompting (optionally) the user for the desired operating system to boot. When the choice is made, lilo goes directly to the block/sector of the kernel boot image (or other operating system boot file) and loads it into memory and executes it. 

The actual compile process (either using make zImage or make zdisk is a lengthy process. A Pentium 100 with 16 megabytes of RAM takes around 15 to 25 minutes to compile the kernel (depending on what has been included). Compiling DEC UNIX on a DEC-Alpha takes around three to four minutes. Have pity for those in the not-so-distant era of the 386 that waited all day for a kernel to recompile.

It is quite OK to be recompiling the kernel while other users are logged onto the system; be aware that this will slow the process down and make the system appear VERY slow to the users (unless you have a "really, nice" machine). 

If you have decided to use dynamically loadable modules, there are two more commands you must issue:

make modules
make modules_install

Note this is done post kernel compile - the useful thing about this is that if you upgrade your modules, you can simply recompile them without the need for a full kernel recompile!

After the make zImage/zlilo/zdisk commands and compiling the modules, your kernel is ready to be tested. As previously stated, it is important to test your kernel before using it as your system boot kernel.

If you find that the kernel is working normally from disk and it hasn't crashed the system (too much), then you can install the kernel to the hard disk.  The easiest way to do this is to go back to the /usr/src/linux directory and type: make zlilo

This will install the copy of the kernel that was previously compiled to disk (a copy is also kept in the kernel source directory) to the hard drive, or whatever boot device lilo is configured to use.

Did you read the documentation? "If all else fails, read the documentation" - this quote is especially true of kernel recompiles. A few common problems that you may be confronted with are:

§         make can not find the Makefile but it is there!:
This is because
make is broken. This was a big problem under the 1.2.n kernels when an updated libc.so.x library was released. The problem was that make would not work under 1.3.n kernels that had been recompiled under the 1.2.n versions with the new library; consequently, you couldn't recompile the kernel under the 1.3.n kernels due to the fact make was not working! This has been fixed since, though at the time the solution was to go and get a new version of make. This is a classic example of what can happen when you start upgrading kernels without upgrading all the libraries, compilers and utilities. Always read the README file before recompiling the kernel and make sure you have all the right versions of libraries, compilers and utilities.

§         make config/dep/clean dies:
This is bad news. It means one of several things: either the config scripts can't find
/bin/bash or /bin/sh, some of the source tree is missing, you are not running the program as root or there is something wrong with your system file permissions/links. It is very rare for this to happen with kernels "unpacked straight from the box". If it does happen, check for the previous reasons; if all else fails, go and get another kernel source.

§         make zImage/zdisk fails:
This is one of those sinking feeling moments when you start getting messages during the compile saying "Error: Something didn't compile/link". Two primary reasons for this are: not running
make clean after make dep and not having the correct libraries installed.  

§         The kernel compiles and boots but it is unstable:
If you are using developmental kernels, this comes with the territory: because developmental kernels can be unstable. If, however, you are using a known "stable" kernel, then the reason is most likely a hardware conflict. Typical culprits are sound cards and network cards. Remove these from the kernel and recompile. You should then examine the documentation on the offending devices to see what the conflict is. Other reasons for kernel instability include compiling in support for devices you don't have (this is rare but can happen) or the fact that you've just discovered a "real" bug in the kernel - in which case the README documentation will assist you in locating the right person to talk to.

If you are still encountering problems, you should examine the newsgroup archives concerned with Linux. There are also several useful mailing lists and web sites that can assist you with kernel problems.


Exercises

13.5      Modify the kernel so that the maximum number of tasks it can run is 50. Compile this kernel to a floppy disk. See how long it takes to use all these processes up.

13.6      Modify your kernel so that the kernel version message (seen on boot time) contains your name. Hint: /usr/src/linux/init contains a file called version.c - modify a data structure in this.

13.7      Recompile your own kernel, including only the components you need. For those components that you need but don't use very oftem, compile them in as modules. Initially boot the kernel from disk, then install it on your hard disk.

Conclusions

In this chapter we have examined:

§         What is a kernel?

§         Why would a Systems Administrator recompile a kernel?

§         What makes up a modern kernel?

§         How would you obtain a kernel?

§         Why and how would you modify the kernel source?

§         How is a kernel configured and recompiled?

§         Why should a kernel be tested?

§         How is a kernel installed?

§         Issues associated with the modern Linux kernel

Further information of the Linux kernel can be obtained from the Linux Kernel HOWTO.

Review Questions

Describe the functions of the kernel; explain the difference between a kernel that uses modules and one that doesn't.

You have added a D-Link ethernet card to your laptop (a D-Link  ethernet card  runs via the parallel port). Describe the steps you'd perform to allow the system to recognise it. Would you compile support for this module directly into the kernel or make it a module? Why/Why not?

You wish to upgrade the kernel on an older system (ver 1.2.n) to the latest kernel. What issues should you consider? What problems could occur with such an upgrade; how would you deal with these?