Chapter 14

Observation, automation and logging

Introduction

The last chapter introduced you to the "why" of automation and system monitoring. This chapter introduces you to how you perform these tasks on the UNIX operating system.

The chapter starts by showing you how to use the cron system to automatically schedule tasks at set times without the intervention of a human. Parts of the cron system you'll be introduced to include crond the daemon, crontab files and the crontab command.

The chapter then looks at how you can find out what is going on with your system. Current disk usage is examined briefly including the commands df and du. Next, process monitoring is looked at with the ps, top, uptime, free, uname kill and nice commands introduced.

Finally we look at how you can find out what has happened with your system. In this section we examine the syslog system which provides a central system for logging system events. We then take a look at both process and login accounting. This last section will also include a look at what you should do with the files generated by logging and accounting.

Automation and cron

A number of the responsibilities of a System Administrator are automated tasks that must be carried out at the regular times every day, week or hour. Examples include, early every morning freeing up disk space by deleting entries in the /tmp directory, performing backups every night or compressing and archiving log files.

Most of these responsibilities require no human interaction other than to start the command. Rather than have the Administrator start these jobs manually, UNIX provides a mechanism that will automatically carry out certain tasks at set times.  This mechanism relies on the cron system.

Components of cron

The cron system consists of the following three components

§         crontab (the cron configuration) files
These are the files which tell the cron system which tasks to perform and when.

§         the crontab command
This is the command used to modify the crontab files.  Even though the crontab files are text files they should not be edited using a text editor.

§         the daemon, crond
The cron daemon is responsible for reading the crontab file and then performing the required tasks at the specified times.  The cron daemon is started by a system startup file. 

crontab format

crontab files are text files with each line consisting of 6 fields separated by spaces. The first five fields specify when to carry out the command and the sixth field specifies the command. Table 14.1, on the following page, outlines the purpose of each of the fields.

Field

Purpose

minute

minute of the hour, 00 to 59

hour

hour of the day, 00 to 24 (military time)

day

day of the month, 1 to 31

month

month of the year, 1 to 12

weekday

day of the week, Linux uses three letter abbreviations, sun, mon, tue,....

command

The actual command to execute

Table 14.1
crontab fields

Comments can be used and are indicated using the # symbol just as with shell programs. Anything that appears after a # symbol until the end of that line is considered a comment and is ignored by crond.

The five time fields can also use any one of the following formats

§         an asterix that matches all possible values,

§         a single integer that matches that exact value,

§         a list of integers separated by commas (no spaces) used to match any one of the values

§         two integers separated by a dash (a range) used to match any value within the range.

For example

Some example crontab entries include (all but the first two examples are taken from the Linux man page for crontab)

0 * * * * echo Cuckoo Cuckoo > /dev/console 2>&1

Every hour (when minutes=0) display Cuckoo Cuckoo on the system console.

30 9-17 * 1 sun,wed,sat echo `date` >> /date.file 2>&1

At half past the hour, between 9 and 5, for every day of January which is a Sunday, Wednesday or Saturday, append the date to the file date.file

0 */2 * * * date

Every two hours at the top of the hour run the date command

0 23-7/2,8 * * * date

Every two hours from 11p.m. to 7a.m., and at 8a.m.

0 11 4 * mon-wed date

At 11:00 a.m. on the 4th and on every mon, tue, wed

0 4 1 jan * date

4:00 a.m. on january 1st

0 4 1 jan * date >> /var/log/messages 2>&1

Once an hour, all output appended to log file

Output

When commands are executed by the crond daemon there is no terminal associated with the process.  This means that standard output and standard error, which are usually set the terminal, must be redirected somewhere else.  In this case the output is emailed to the person who's crontab file the command appears.  It is possible to use I/O redirection to redirect the output of the commands to files.  Some of the examples above use output redirection to send the output of the commands to a log file.

Exercises

14.1      Write crontab entries for the following.
- run the program date every minute of every day and send the output to a file called date.log
- remove all the contents of the directory /tmp at 5:00am every morning
- execute a shell script /root/weekly.job every Wednesday
- run the program /root/summary at 3, 6 and 9 pm for the first five days of a month

Creating crontab files

crontab files should not be modified using an editor instead they should be created and modified using the crontab command. Refer for the manual page for crontab for more information but the following are two of the basic methods for using the command.

1. crontab [file]

2. crontab [-e | -r | -l ] [username]

Version 1 is used to replace an existing crontab file with the contents of standard input or the specified file.

Version 2 makes use of one of the following command line options

§         -e
Allows the user to edit the crontab file using an editor (the command will perform some additional actions to make it safe to do so)

§         -r
Remove the user's crontab file

§         -l
Display the user's crontab file onto standard output

By default all actions are carried out on the user's own crontab file. Only the root user can specify another username and modify that user's crontab file.

Exercise

14.2      Using the crontab command to add the following to your crontab file and observe what happens.
run the program date every minute of every day and send the output to a file called date.log

What's going on

A part of the day to day operation of a system is keeping an eye on the systems current state. This section introduces a number of commands and tools that can be used to examine the current state of the system.

The tools are divided into two sections based on what they observe. The sections are

§         disk and file system observation, and
The commands du and df

§         process observation and manipulation.
The commands ps, kill, nice and top.

need to add the observation Web-based system

df

df summarises that amount of free disk space. By default df will display the following information for all mounted file systems

§         total number of disk blocks,

§         number of disk blocks used,

§         number available

§         percentage of disk blocks used, and

§         where the file system is mounted.

df also has an option, -i to display Inode usage rather than disk block usage. What an Inode is will be explained in a later chapter. Simply every file that is created must have an Inode. If all the Inodes are used you can't create anymore files. Even if you have disk space available.

The -T option will cause df to display each file systems type.


Exercise

14.3      Use the df command to answer the following questions
- how many partitions do you have mounted
- how much disk space do you have left on your Linux partition
- how many more files can you create on your Linux partition

du

The du command is used to discover the amount of disk space used by file or directory. By default du reports file size as a number of 1 kilobyte blocks. There are options to modify the command so it reports size in bytes (-b) or kilobytes (-k).

If you use du on a directory it will report back the size of each file and directory within it and recursively descend down any sub-directories. The -s switch is used to produce the total amount of disk used by the contents of a directory.

There are other options that allow you to modify the operation of du with respect to partitions and links.

Exercise

14.4      Use the du command to answer the following questions
- how many blocks does the /etc/passwd file use,
- how large (in bytes) is the /etc/passwd file,
- how disk space is used by the /etc/ directory, the usr directory

System Status

Table 14.2 summarises some of the commands that can be used to examine the current state of your machine. Some of the information they display includes

§         amount of free and used memory,

§         the amount of time the system has been up,

§         the load average of the system,
Load average is the number processes ready to be run and is used to give some idea of how busy your system is.

§         the number of processes and amount of resources they are consuming.

Some of the commands are explained below. For those that aren't use your system's manual pages to discover more.


 

Command

Purpose

free

display the amount of free and used memory

uptime

how long has the system been running and what is the current load average

ps

one off snap shot of the current processes

top

continual listing of current processes

uname

display system information including the hostname, operating system and version and current date and time

Table 14.2
System status commands

ps

The ps command displays a list of information about the process that were running at the time the ps command was executed.

ps has a number of options that modify what information it displays. Table 14.3 lists some of the more useful or interesting options that the Linux version of PS supports.

Table 14.4 explains the headings used by ps for the columns it produces.

For more information on the ps command you should refer to the manual page.

Option

Purpose

l

long format

u

displays username (rather than uid) and the start time of the process

m

display process memory info

a

display processes owned by other users (by default ps only shows your processes)

x

shows processes that aren't controlled by a terminal

f

use a tree format to show parent/child relationships between processes

w

don't truncate lines to fit on screen

Table 14.3
ps options

Field

Purpose

NI

the nice value

SIZE

memory size of the processes code, data and stack

RSS

kilobytes of the program in memory (the resident set size)

STAT

the status of the process (R-runnable, S-sleeping, D-uninterruptable sleep, T-stopped, Z-zombie)

TTY

the controlling terminal

Table 14.4
ps fields

Exercise

14.5      Use the ps command to answer the following questions
- how many processes do you currently own
- how many processes are running on your system
- how much RAM does the ps command use
- what's the current running process

top

ps provides a one-off snap shot of the processes on your system. For an on-going look at the processes Linux generally comes with the top command. It also displays a collection of other information about the state of your system including

§         uptime, the amount of time the system has been up

§         the load average,

§         the total number of processes,

§         percentage of CPU time in user and system mode,

§         memory usage statistics

§         statistics on swap memory usage

Refer to the man page for top for more information.

top is not a standard UNIX command however it is generally portable and available for most platforms.

top displays the process on your system ranked in order from the most CPU intensive down and updates that display at regular intervals. It also provides an interface by which you can manipulate the nice value and send processes signals.

The nice value

The nice value specifies how "nice" your process is being to the other users of the system. It provides the system with some indication of how important the process is. The lower the nice value the higher the priority. Under Linux the nice value ranges from -20 to 19.

By default a new process inherits the nice value of its parent. The owner of the process can increase the nice value but cannot lower it (give it a higher priority). The root account has complete freedom in setting the nice value.

nice

The nice command is used to set the nice value of a process when it first starts.


renice

The renice command is used to change the nice value of a process once it has started.

Signals

When you hit the CTRL-C combination to stop the execution of a process a signal (the TERM signal) is sent to the process. By default many processes will terminate when they receive this signal

The UNIX operating system generates a number of different signals. Each signal has an associated unique identifying number and a symbolic name. Table 14.6 lists some of the more useful signals used by the Linux operating system. There are 32 in total and they are listed in the file /usr/include/linux/signal.h

SIGHUP

The SIGHUP signal is often used when reconfiguring a daemon. Most daemons will only read the configuration file when they startup. If you modify the configuration file for the daemon you have to force it to re-read the file. One method is to send the daemon the SIGHUP signal.

SIGKILL

This is the big "don't argue" signal. Almost all processes when receiving this signal will terminate. It is possible for some processes to ignore this signal but only after getting themselves into serious problems. The only way to get rid of these processes is to reboot the system.

Symbolic Name

Numeric identifier

Purpose

SIGHUP

1

hangup

SIGKILL

9

the kill signal

SIGTERM

15

software termination

Table 14.5
Linux signals

kill

The kill command is used to send signals to processes. The format of the kill command is

kill [-signal] pid

This will send the signal specified by the number signal to the process identified with process identifier pid. The kill command will handle a list of process identifiers and signals specified using either their symbolic or numeric formats.

By default kill sends signal number 15 (the TERM signal).


What's happened?

There will be times when you want to reconstruct what happened in the lead up to a problem. Situations where this might be desirable include

§         you believe someone has broken into your system,

§         one of the users performed an illegal action while online, and

§         the machine crashed mysteriously at some odd time.

Logging and accounting

This is where

§         logging, and
The recording of certain events, errors, emergencies.

§         accounting.
Recording who did what and when.

become useful.

This section examines the methods under Linux by which logging and accounting are performed. In particular it will examine

§         the syslog system,

§         process accounting, and

§         login accounting.

Managing log and accounting files

Both logging and accounting tend to generate a great deal of information especially on a busy system. One of the decisions the Systems Administrator must make is what to do with these files. Options include

§         don't create them in the first place,
The head in the sand approach. Not a good idea.

§         keep them for a few days, then delete them, and
If a problem hasn't been identified within a few days then assume there is no reasons to keep the log files. Therefore delete the existing ones and start from scratch.

§         keep them for a set time and then archive them.
Archiving these files might include compressing them and storing them online or copying them to tape.

Centralise

If you are managing multiple computers it is advisable to centralise the logging and accounting files so that they all appear on the one machine. This makes maintaining and observing the files easier.


Logging

The ability to log error messages or the actions carried out by a program or script is fairly standard. On earlier versions of UNIX each individual program would have its own configuration file that controlled where and what to log. This led to multiple configuration and log files that made it difficult for the Systems Administrator to control and each program had to know how to log.

syslog

The syslog system was devised to provide a central logging facility that could be used by all programs. This was useful because Systems Administrators could control where and what should be logged by modifying a single configuration file and because it provided a standard mechanism by which programs could log information.

Components of syslog

The syslog system can be divided into a number of components

§         default log file,
On many systems messages are logged by default into the file /var/log/messages

§         the syslog message format,

§         the application programmer's interface,
The API programs use to log information.

§         the daemon, and
The program that directs logging information to the correct location based on the configuration file.

§         the configuration file.
Controls what information is logged and where it is logged.

Exercise

14.6      Examine the contents of the file /var/log/messages. You will probably have to be the root user to do so. One useful piece of information you should find in that file is a copy of the text that appears as Linux boots.

syslog message format

syslog uses a standard message format for all information that is logged. This format includes

§         a facility,
The facility is used to describe the part of the system that is generating the message. Table 14.3 lists some of the common facilities.

§         a level,
The level indicates the severity of the message. In lowest to highest order the levels are debug info notice warning err crit alert emerg

§         and a string of characters containing a message.

 

Facility

Source

kern

the kernel

mail

the mail system

lpr

the print system

daemon

a variety of system daemons

auth

the login authentication system

Table 14.6
Common syslog facilities

syslog's API

In order for syslog to be useful application programs must be able to pass messages to the syslog daemon so it can log the messages according to the configuration file..  There are at least two methods which application programs can use to send messages to syslog.  These are:

§         logger ,
logger is a UNIX command.  It is designed to be used by shell programs which wish to use the syslog facility.

§         the syslog API.
The API (application program interface) consists of a set of the functions (openlog syslog closelog) which are used by programs written in compiled languages such as C and C++.  This API is defined in the syslog.h file.  You will find this file in the system include directory /usr/include.

Exercises

14.7      Examine the manual page for logger. Use logger from the command line to send a message to syslog

14.8      Examine the manual page for openlog and write a C program to send a message to syslog

syslogd

syslogd is the syslog daemon. It is started when the system boots by one of the startup scripts. syslogd reads its configuration file when it startups or when it receives the HUP signal. The standard configuration file is /etc/syslog.conf.

syslogd receives logging messages and carries out actions as specified in the configuration file. Standard actions include

§         appending the message to a specific file,

§         forwarding the message to the syslogd on a different machine, or

§         display the message on the consoles of all or some of the logged in users.

/etc/syslog.conf

By default syslogd uses the file /etc/syslog.conf as its configuration file. It is possible using a command line parameter of syslogd to use another configuration file.

A syslog configuration file is a text file. Each line is divided into two fields separated by one or more spaces or tab characters

§         a selector, and
Used to match log messages.

§         an action.
Specifies what to do with a message if it is matched by the selector

The selector

The selector format is facility.level where facility and level level match those terms introduced in the syslog message format section from above.

A selector field can include

§         multiple selectors separated by ; characters

§         multiple facilities, separated by a , character, for a single level

§         an * character to match all facilities or levels

The level can be specified with or without a =. If the = is used only messages at exactly that level will be matched. Without the = all messages at or above the specified level will be matched.

syslog.conf actions

The actions in the syslog configuration file can take one of four formats

§         a pathname starting with /
Messages are appended onto the end of the file.

§         a hostname starting with a @
Messages are forwarded to the syslogd on that machine.

§         a list of users separated by commas
Messages appear on the screens of those users if they are logged in.

§         an asterix
Messages are displayed on the screens of all logged in users.

For example

The following is an example syslog configuration file taken from the Linux manual page for syslog.conf

# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.*                         /dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none              /var/log/messages

# The authpriv file has restricted access.
authpriv.*                      /var/log/secure

# Log all the mail messages in one place.
mail.*                          /var/log/maillog

# Everybody gets emergency messages, plus log them on another
# machine.
*.emerg                         *

# Save mail and news errors of level err and higher in a
# special file.
uucp,news.crit                      /var/log/spooler

Exercise

14.9      A common problem on many systems are users who consume too much disk space.  One method to deal with this is to have a script which regularly checks on disk usage by users and reports those users who are consuming too much.  The following is one example of a script to do this.

#!/bin/bash

# global constant
# DISKHOGFILE holds the location of the file defining each users
# maximum disk space
DISKHOGFILE="disk.hog"
# OFFENDERFILE specifiesl where to write information about offending
# users
OFFENDERFILE="offender"

space_used()
  # accept a username as 1st parameter
  # return amount of disk space used by the users home directory
  # in a variable usage
{
  # home directory is the sixth field in /etc/passwd
  the_home=`grep ^$1: /etc/passwd | cut -d: -f6`
  # du uses a tab character to seperate out its fields
  # we're only interested in the first one
  usage=`du -s $the_home | cut -f1`
}

#
# Main Program
#

while read username max_space
do
  space_used $username
  if [ $usage -gt $max_space ]
  then
    echo $username has a limit of $max_space and has used $used  $OFFENDERFILE
  fi
done < $DISKHOGFILE

Modify this script so that it uses the syslog system rather than displaying its output onto standard output.

14.10   Configure syslog so the messages from the script in the previous question are appended to the logfile /var/log/disk.hog.messages and also to the main system console.

Accounting

Accounting was developed when computers were expensive resources and people were charged per command or CPU time. In today's era of cheap, powerful computers its rarely used for these purposes. One thing accounting is used for is as a source of records about the use of the system. Particular useful if someone is trying, or has, broken into your system.

In this section we will examine

§         login accounting.

§         process accounting

Login accounting

The file /var/log/wtmp is used to store the username, terminal port, login and logout times of every connection to a Linux machine. Every time you login or logout the wtmp file is updated.  This task is performed by init.

last

The last command is used to view the contents of the wtmp file. There are options to limit interest to a particular user or terminal port.

Exercise

14.11   Use the last command to
- count how many logins there have been since the current wtmp file was created,
- how many times has the root user logged in

ac

The last command provides rather rudimentary summary of the information in the wtmp file.  As a Systems Administrator it is possible that you may require more detailed summaries of this information.  For example, you may desire to know the total number of hours each user has been logged in, how long per day and various other information.

The command that provides this information is the ac command.


Installing ac

It is possible that you will not have the ac command installed.  On a RedHat Linux 5.0 machine it should be located in /usr/bin/ac.  The ac command is part of the psacct package.  If you don't have ac installed you will have to use rpm or glint to install the package.

Exercise

14.12   Use the ac command to
- find the total number of hours you were logged in as the root user
- find the average number of hours per login for all users
- find the total and average hours of login for the root user for the last 7 days

Process accounting

Also known as CPU accounting, process accounting records the elapsed CPU time, average memory use, I/O summary, the name of the user who ran the process, the command name and the time each process finished.

Turning process accounting on

Process accounting does not occur until it is turned on using the accton command.

accton /var/log/acct

Where /var/log/acct is the file in which the process accounting information will be stored. The file must already exist before it will work.  You can use any filename you wish but many of the accounting utilities rely on you using this file.

lastcomm

lastcomm is used to display the list of commands executed either for everyone, for particular users, from particular terminals or just information about a particular command. Refer to the lastcomm manual page for more information.

[root@beldin /proc]# lastcomm david
netscape               david    tty1       0.02 secs Sun Jan 25 16:26
[root@beldin /proc]# lastcomm ttyp2
lastcomm               root     ttyp2      0.55 secs Sun Jan 25 16:21
ls                     root     ttyp2      0.03 secs Sun Jan 25 16:21
ls                     root     ttyp2      0.02 secs Sun Jan 25 16:21
accton                 root     ttyp2      0.01 secs Sun Jan 25 16:21


The sa command

The sa command is used to provide more detailed summaries of the information stored by process accounting and also to summarise the information into other files.

[root@beldin /proc]# /usr/sbin/sa -a
      66       0.19re       0.25cp
       6       0.01re       0.16cp   cat
       8       0.00re       0.04cp   lastcomm
      17       0.00re       0.01cp   ls
       6       0.01re       0.01cp   man
       1       0.00re       0.01cp   troff
       5       0.01re       0.01cp   less
       1       0.15re       0.01cp   in.ftpd
       6       0.01re       0.01cp   sh
       5       0.00re       0.00cp   gunzip
       1       0.00re       0.00cp   grotty
       2       0.00re       0.00cp   sa
       1       0.00re       0.00cp   groff
       1       0.00re       0.00cp   gtbl
       1       0.00re       0.00cp   gzip
       1       0.00re       0.00cp   sh*
       1       0.00re       0.00cp   netscape*
       1       0.00re       0.00cp   accton
       2       0.00re       0.00cp   bash*

Refer to the manual pages for the sa command for more information.

So what?

This section has given a very brief overview of process and login accounting and the associated commands and files.  What use do these systems fulfil for a Systems Administrator?  The main one is that they allow you to track what is occurring on your system and who is doing it.  This can be useful for a number of reasons

§         tracking which user's are abusing the system

§         figuring out what is normal for a user
If you know that most of your users never use commands like sendmail and the C compilers (via process accounting) and then all of a sudden they start using this might be an indication of a break in.

§         justifying to management the need for a larger system
Generally management won't buy you a bigger computer just because you want one.  In most situations you will have to put together a case to justify why the additional expenditure is necessary.  Process and login account could provide some of the necessary information.

Conclusions

The cron system is used to automatically perform tasks at set times. Components of the cron system include

§         the daemon, crond,
Which actually performs the specified tasks.

§         crontab files, and
That specify the when and what.

§         the crontab command.
Used to manipulate the crontab files.

Useful commands for examining the current status of your systems file system include df and du. Commands for examining and manipulating processes include ps, kill, renice, nice and top. Other "status" commands include free, uptime and uname.

syslog is a centralised system for logging information about system events. It's components include

§         an API and a program (logger) by which information can be logged,

§         the syslogd daemon that actually performs the logging, and

§         the /etc/syslog.conf that specifies what and where logging information should be logged.

Login accounting is used to track when, where and for how long users connect to your system. Process accounting is used to track when and what commands were executed. By default Linux does not provide full support for either form of accounting (it does offer some standard login accounting but not the extra command sac). However there are freely available software distributions that provide Linux this functionality.

Login accounting is performed in the /var/log/wtmp file that is used to store the details of every login and logout from the system. The last command can be used to view the contents of the binary /var/log/wtmp file. The non-standard command sac can be used to summarise this information into a number of useful formats.

Process accounting must be turned on using the accton command and the results can be viewed using the lastcomm command.

Both logging and accounting can produce files that grow to some considerable size in a short amount of time. The Systems Adminstrator must implement strategies to deal with these log files. Either by ignoring and deleting them or by saving them to tape.

Review Questions

14.1

Explain the relationship between each of the following

§         crond, crontab files and the crontab command,

§         syslogd, logger and /etc/syslog.conf

§         /var/adm/wtmp, last and sac

14.2

You have just modified the /etc/syslog.conf file. Will your changes take effect immediately? If not what command would you use to make the modifications take effect? How could you check that the modifications are working?

14.3

Write crontab entries to achieve the following

§         run the script /usr/local/adm/bin/archiveIt every Monday at 6 am

§         run a script /usr/local/adm/bin/diskhog on Monday, Wednesday and Friday at 6am, 12pm, 4pm