The last chapter introduced you to the "why" of automation and system monitoring. This chapter introduces you to how you perform these tasks on the UNIX operating system.
The chapter starts by showing you how to use the cron system to automatically schedule tasks at set times without the intervention of a human. Parts of the cron system you'll be introduced to include crond the daemon, crontab files and the crontab command.
The chapter then looks at how you can find out what is going on with your system. Current disk usage is examined briefly including the commands df and du. Next, process monitoring is looked at with the ps, top, uptime, free, uname kill and nice commands introduced.
Finally we look at how you can find out what has happened with your system. In this section we examine the syslog system which provides a central system for logging system events. We then take a look at both process and login accounting. This last section will also include a look at what you should do with the files generated by logging and accounting.
A number of the responsibilities of a System Administrator are automated tasks that must be carried out at the regular times every day, week or hour. Examples include, early every morning freeing up disk space by deleting entries in the /tmp directory, performing backups every night or compressing and archiving log files.
Most of these responsibilities require no human interaction other than to start the command. Rather than have the Administrator start these jobs manually, UNIX provides a mechanism that will automatically carry out certain tasks at set times. This mechanism relies on the cron system.
The cron system consists of the following three components
§
crontab
(the cron configuration) files
These are the files which tell the cron system which tasks to perform and when.
§
the crontab command
This is the command used to modify the crontab files. Even though the crontab files are text files they should not
be edited using a text editor.
§
the daemon, crond
The cron daemon is responsible for reading the crontab file and then performing
the required tasks at the specified times.
The cron daemon is started by a system startup file.
crontab files are text files with each line consisting of 6 fields separated by spaces. The first five fields specify when to carry out the command and the sixth field specifies the command. Table 14.1, on the following page, outlines the purpose of each of the fields.
|
Field |
Purpose |
|
minute |
minute of the hour, 00 to 59 |
|
hour |
hour of the day, 00 to 24 (military time) |
|
day |
day of the month, 1 to 31 |
|
month |
month of the year, 1 to 12 |
|
weekday |
day of the week, Linux uses three letter abbreviations, sun, mon, tue,.... |
|
command |
The actual command to execute |
Table 14.1
crontab
fields
Comments can be used and are indicated using the # symbol just as with shell programs. Anything that appears after a # symbol until the end of that line is considered a comment and is ignored by crond.
The five time fields can also use any one of the following formats
§ an asterix that matches all possible values,
§ a single integer that matches that exact value,
§ a list of integers separated by commas (no spaces) used to match any one of the values
§ two integers separated by a dash (a range) used to match any value within the range.
Some example crontab entries include (all but the first two examples are taken from the Linux man page for crontab)
0
* * * * echo Cuckoo Cuckoo > /dev/console 2>&1
Every hour (when minutes=0) display Cuckoo Cuckoo on the system console.
30
9-17 * 1 sun,wed,sat echo `date` >> /date.file 2>&1
At half past the hour, between 9 and 5, for every day of January which is a Sunday, Wednesday or Saturday, append the date to the file date.file
0
*/2 * * * date
Every two hours at the top of the hour run the date command
0
23-7/2,8 * * * date
Every two hours from 11p.m. to 7a.m., and at 8a.m.
0
11 4 * mon-wed date
At 11:00 a.m. on the 4th and on every mon, tue, wed
0
4 1 jan * date
4:00 a.m. on january 1st
0
4 1 jan * date >> /var/log/messages 2>&1
Once an hour, all output appended to log file
When commands are executed by the crond daemon there is no terminal associated with the process. This means that standard output and standard error, which are usually set the terminal, must be redirected somewhere else. In this case the output is emailed to the person who's crontab file the command appears. It is possible to use I/O redirection to redirect the output of the commands to files. Some of the examples above use output redirection to send the output of the commands to a log file.
14.1
Write crontab entries for the following.
- run the program date
every minute of every day and send the output to a file called date.log
- remove all the contents of the directory /tmp
at 5:00am every morning
- execute a shell script /root/weekly.job
every Wednesday
- run the program /root/summary
at 3, 6 and 9 pm for the first five days of a month
crontab files should not be modified using an editor instead they should be created and modified using the crontab command. Refer for the manual page for crontab for more information but the following are two of the basic methods for using the command.
1. crontab [file]
2. crontab [-e | -r | -l ] [username]
Version 1 is used to replace an existing crontab file with the contents of standard input or the specified file.
Version 2 makes use of one of the following command line options
§
-e
Allows the user to edit the crontab
file using an editor (the command will perform some additional actions to make
it safe to do so)
§
-r
Remove the user's crontab
file
§
-l
Display the user's crontab
file onto standard output
By default all actions are carried out on the user's own crontab file. Only the root user can specify another username and modify that user's crontab file.
14.2
Using the crontab
command to add the following to your crontab
file and observe what happens.
run the program date
every minute of every day and send the output to a file called date.log
A part of the day to day operation of a system is keeping an eye on the systems current state. This section introduces a number of commands and tools that can be used to examine the current state of the system.
The tools are divided into two sections based on what they observe. The sections are
§
disk and file system observation, and
The commands du and df
§
process observation and manipulation.
The commands ps, kill,
nice and top.
need to add the observation
Web-based system
df summarises that amount of free disk space. By default df will display the following information for all mounted file systems
§ total number of disk blocks,
§ number of disk blocks used,
§ number available
§ percentage of disk blocks used, and
§ where the file system is mounted.
df also has an option, -i to display Inode usage rather than disk block usage. What an Inode is will be explained in a later chapter. Simply every file that is created must have an Inode. If all the Inodes are used you can't create anymore files. Even if you have disk space available.
The -T option will cause df to display each file systems type.
14.3
Use the df
command to answer the following questions
- how many partitions do you have mounted
- how much disk space do you have left on your Linux partition
- how many more files can you create on your Linux partition
The du command is used to discover the amount of disk space used by file or directory. By default du reports file size as a number of 1 kilobyte blocks. There are options to modify the command so it reports size in bytes (-b) or kilobytes (-k).
If you use du on a directory it will report back the size of each file and directory within it and recursively descend down any sub-directories. The -s switch is used to produce the total amount of disk used by the contents of a directory.
There are other options that allow you to modify the operation of du with respect to partitions and links.
14.4
Use the du
command to answer the following questions
- how many blocks does the /etc/passwd
file use,
- how large (in bytes) is the /etc/passwd
file,
- how disk space is used by the /etc/
directory, the usr
directory
Table 14.2 summarises some of the commands that can be used to examine the current state of your machine. Some of the information they display includes
§ amount of free and used memory,
§ the amount of time the system has been up,
§
the load average of the system,
Load average is the number processes ready to be run and is used to give some
idea of how busy your system is.
§ the number of processes and amount of resources they are consuming.
Some of the commands are explained below. For those that aren't use your system's manual pages to discover more.
|
Command |
Purpose |
|
free |
display the amount of free and used memory |
|
uptime |
how long has the system been running and what is the current load average |
|
ps |
one off snap shot of the current processes |
|
top |
continual listing of current processes |
|
uname |
display system information including the hostname, operating system and version and current date and time |
Table 14.2
System status commands
The ps command displays a list of information about the process that were running at the time the ps command was executed.
ps has a number of options that modify what information it displays. Table 14.3 lists some of the more useful or interesting options that the Linux version of PS supports.
Table 14.4 explains the headings used by ps for the columns it produces.
For more information on the ps command you should refer to the manual page.
|
Option |
Purpose |
|
l |
long format |
|
u |
displays username (rather than uid) and the start time of the process |
|
m |
display process memory info |
|
a |
display processes owned by other users (by default ps only shows your processes) |
|
x |
shows processes that aren't controlled by a terminal |
|
f |
use a tree format to show parent/child relationships between processes |
|
w |
don't truncate lines to fit on screen |
Table 14.3
ps
options
|
Field |
Purpose |
|
NI |
the nice value |
|
SIZE |
memory size of the processes code, data and stack |
|
RSS |
kilobytes of the program in memory (the resident set size) |
|
STAT |
the status of the process (R-runnable, S-sleeping, D-uninterruptable sleep, T-stopped, Z-zombie) |
|
TTY |
the controlling terminal |
Table 14.4
ps
fields
14.5
Use the ps
command to answer the following questions
- how many processes do you currently own
- how many processes are running on your system
- how much RAM does the ps
command use
- what's the current running process
ps provides a one-off snap shot of the processes on your system. For an on-going look at the processes Linux generally comes with the top command. It also displays a collection of other information about the state of your system including
§ uptime, the amount of time the system has been up
§ the load average,
§ the total number of processes,
§ percentage of CPU time in user and system mode,
§ memory usage statistics
§ statistics on swap memory usage
Refer to the man page for top for more information.
top is not a standard UNIX command however it is generally portable and available for most platforms.
top displays the process on your system ranked in order from the most CPU intensive down and updates that display at regular intervals. It also provides an interface by which you can manipulate the nice value and send processes signals.
The nice value specifies how "nice" your process is being to the other users of the system. It provides the system with some indication of how important the process is. The lower the nice value the higher the priority. Under Linux the nice value ranges from -20 to 19.
By default a new process inherits the nice value of its parent. The owner of the process can increase the nice value but cannot lower it (give it a higher priority). The root account has complete freedom in setting the nice value.
The nice command is used to set the nice value of a process when it first starts.
The renice command is used to change the nice value of a process once it has started.
When you hit the CTRL-C combination to stop the execution of a process a signal (the TERM signal) is sent to the process. By default many processes will terminate when they receive this signal
The UNIX operating system generates a number of different signals. Each signal has an associated unique identifying number and a symbolic name. Table 14.6 lists some of the more useful signals used by the Linux operating system. There are 32 in total and they are listed in the file /usr/include/linux/signal.h
The SIGHUP signal is often used when reconfiguring a daemon. Most daemons will only read the configuration file when they startup. If you modify the configuration file for the daemon you have to force it to re-read the file. One method is to send the daemon the SIGHUP signal.
This is the big "don't argue" signal. Almost all processes when receiving this signal will terminate. It is possible for some processes to ignore this signal but only after getting themselves into serious problems. The only way to get rid of these processes is to reboot the system.
|
Symbolic
Name |
Numeric
identifier |
Purpose |
|
SIGHUP |
1 |
hangup |
|
SIGKILL |
9 |
the kill signal |
|
SIGTERM |
15 |
software termination |
Table 14.5
Linux signals
The kill command is used to send signals to processes. The format of the kill command is
kill [-signal] pid
This will send the signal specified by the number signal to the process identified with process identifier pid. The kill command will handle a list of process identifiers and signals specified using either their symbolic or numeric formats.
By default kill sends signal number 15 (the TERM signal).
There will be times when you want to reconstruct what happened in the lead up to a problem. Situations where this might be desirable include
§ you believe someone has broken into your system,
§ one of the users performed an illegal action while online, and
§ the machine crashed mysteriously at some odd time.
This is where
§
logging, and
The recording of certain events, errors, emergencies.
§
accounting.
Recording who did what and when.
become useful.
This section examines the methods under Linux by which logging and accounting are performed. In particular it will examine
§ the syslog system,
§ process accounting, and
§ login accounting.
Both logging and accounting tend to generate a great deal of information especially on a busy system. One of the decisions the Systems Administrator must make is what to do with these files. Options include
§
don't create them in the first place,
The head in the sand approach. Not a good idea.
§
keep them for a few days, then delete them, and
If a problem hasn't been identified within a few days then assume there is no
reasons to keep the log files. Therefore delete the existing ones and start from
scratch.
§
keep them for a set time and then archive them.
Archiving these files might include compressing them and storing them online or
copying them to tape.
If you are managing multiple computers it is advisable to centralise the logging and accounting files so that they all appear on the one machine. This makes maintaining and observing the files easier.
The ability to log error messages or the actions carried out by a program or script is fairly standard. On earlier versions of UNIX each individual program would have its own configuration file that controlled where and what to log. This led to multiple configuration and log files that made it difficult for the Systems Administrator to control and each program had to know how to log.
The syslog system was devised to provide a central logging facility that could be used by all programs. This was useful because Systems Administrators could control where and what should be logged by modifying a single configuration file and because it provided a standard mechanism by which programs could log information.
The syslog system can be divided into a number of components
§
default log file,
On many systems messages are logged by default into the file /var/log/messages
§ the syslog message format,
§
the application programmer's interface,
The API programs use to log information.
§
the daemon, and
The program that directs logging information to the correct location based on
the configuration file.
§
the configuration file.
Controls what information is logged and where it is logged.
14.6 Examine the contents of the file /var/log/messages. You will probably have to be the root user to do so. One useful piece of information you should find in that file is a copy of the text that appears as Linux boots.
syslog uses a standard message format for all information that is logged. This format includes
§
a facility,
The facility is used to describe the part of the system that is generating the
message. Table 14.3 lists some of the common facilities.
§
a level,
The level indicates the severity of the message. In lowest to highest order the
levels are debug info notice
warning err crit alert emerg
§ and a string of characters containing a message.
|
Facility |
Source |
|
kern |
the kernel |
|
|
the mail system |
|
lpr |
the print system |
|
daemon |
a variety of system daemons |
|
auth |
the login authentication system |
Table 14.6
Common syslog
facilities
In order for syslog to be useful application programs must be able to pass messages to the syslog daemon so it can log the messages according to the configuration file.. There are at least two methods which application programs can use to send messages to syslog. These are:
§
logger
,
logger is a UNIX command. It is
designed to be used by shell programs which wish to use the syslog facility.
§
the syslog API.
The API (application program interface) consists of a set of the functions (openlog
syslog closelog) which are used by programs written in compiled languages
such as C and C++. This API is
defined in the syslog.h
file. You will find this file in
the system include directory /usr/include.
14.7 Examine the manual page for logger. Use logger from the command line to send a message to syslog
14.8 Examine the manual page for openlog and write a C program to send a message to syslog
syslogd is the syslog daemon. It is started when the system boots by one of the startup scripts. syslogd reads its configuration file when it startups or when it receives the HUP signal. The standard configuration file is /etc/syslog.conf.
syslogd receives logging messages and carries out actions as specified in the configuration file. Standard actions include
§ appending the message to a specific file,
§ forwarding the message to the syslogd on a different machine, or
§ display the message on the consoles of all or some of the logged in users.
By default syslogd uses the file /etc/syslog.conf as its configuration file. It is possible using a command line parameter of syslogd to use another configuration file.
A syslog configuration file is a text file. Each line is divided into two fields separated by one or more spaces or tab characters
§
a selector, and
Used to match log messages.
§
an action.
Specifies what to do with a message if it is matched by the selector
The selector format is facility.level where facility and level level match those terms introduced in the syslog message format section from above.
A selector field can include
§ multiple selectors separated by ; characters
§ multiple facilities, separated by a , character, for a single level
§ an * character to match all facilities or levels
The level can be specified with or without a =. If the = is used only messages at exactly that level will be matched. Without the = all messages at or above the specified level will be matched.
The actions in the syslog configuration file can take one of four formats
§
a pathname starting with /
Messages are appended onto the end of the file.
§
a hostname starting with a @
Messages are forwarded to the syslogd
on that machine.
§
a list of users separated by commas
Messages appear on the screens of those users if they are logged in.
§
an asterix
Messages are displayed on the screens of all logged in users.
The following is an example syslog configuration file taken from the Linux manual page for syslog.conf
# Log all kernel messages to the
console.
# Logging much else clutters up the screen.
#kern.*
/dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none
/var/log/messages
# The authpriv file has restricted access.
authpriv.*
/var/log/secure
# Log all the mail messages in one place.
mail.*
/var/log/maillog
# Everybody gets emergency messages, plus log them on another
# machine.
*.emerg
*
# Save mail and news errors of level err and higher in a
# special file.
uucp,news.crit
/var/log/spooler
14.9
A common problem on many systems are users who consume too
much disk space. One method to
deal with this is to have a script which regularly checks on disk usage by
users and reports those users who are consuming too much.
The following is one example of a script to do this.
#!/bin/bash
# global constant
# DISKHOGFILE holds the location of the file defining each users
# maximum disk space
DISKHOGFILE="disk.hog"
# OFFENDERFILE specifiesl where to write information about offending
# users
OFFENDERFILE="offender"
space_used()
# accept a username as 1st
parameter
# return amount of disk space
used by the users home directory
# in a variable usage
{
# home directory is the sixth
field in /etc/passwd
the_home=`grep ^$1: /etc/passwd
| cut -d: -f6`
# du uses a tab character to
seperate out its fields
# we're only interested in the
first one
usage=`du -s $the_home | cut
-f1`
}
#
# Main Program
#
while read username max_space
do
space_used $username
if [ $usage -gt $max_space ]
then
echo $username has a
limit of $max_space and has used $used $OFFENDERFILE
fi
done < $DISKHOGFILE
Modify this script so that it uses the syslog
system rather than displaying its output onto standard output.
14.10 Configure syslog so the messages from the script in the previous question are appended to the logfile /var/log/disk.hog.messages and also to the main system console.
Accounting was developed when computers were expensive resources and people were charged per command or CPU time. In today's era of cheap, powerful computers its rarely used for these purposes. One thing accounting is used for is as a source of records about the use of the system. Particular useful if someone is trying, or has, broken into your system.
In this section we will examine
§ login accounting.
§ process accounting
The file /var/log/wtmp is used to store the username, terminal port, login and logout times of every connection to a Linux machine. Every time you login or logout the wtmp file is updated. This task is performed by init.
The last command is used to view the contents of the wtmp file. There are options to limit interest to a particular user or terminal port.
14.11
Use the last
command to
- count how many logins there have been since the current wtmp
file was created,
- how many times has the root user logged in
The last command provides rather rudimentary summary of the information in the wtmp file. As a Systems Administrator it is possible that you may require more detailed summaries of this information. For example, you may desire to know the total number of hours each user has been logged in, how long per day and various other information.
The command that provides this information is the ac command.
It is possible that you will not have the ac command installed. On a RedHat Linux 5.0 machine it should be located in /usr/bin/ac. The ac command is part of the psacct package. If you don't have ac installed you will have to use rpm or glint to install the package.
14.12
Use the ac
command to
- find the total number of hours you were logged in as the root user
- find the average number of hours per login for all users
- find the total and average hours of login for the root user for the last 7
days
Also known as CPU accounting, process accounting records the elapsed CPU time, average memory use, I/O summary, the name of the user who ran the process, the command name and the time each process finished.
Process accounting does not occur until it is turned on using the accton command.
accton /var/log/acct
Where /var/log/acct is the file in which the process accounting information will be stored. The file must already exist before it will work. You can use any filename you wish but many of the accounting utilities rely on you using this file.
lastcomm is used to display the list of commands executed either for everyone, for particular users, from particular terminals or just information about a particular command. Refer to the lastcomm manual page for more information.
[root@beldin
/proc]# lastcomm david
netscape
david tty1
0.02 secs Sun Jan 25 16:26
[root@beldin /proc]# lastcomm ttyp2
lastcomm
root ttyp2 0.55 secs Sun Jan 25 16:21
ls
root ttyp2 0.03 secs Sun Jan 25 16:21
ls root
ttyp2 0.02
secs Sun Jan 25 16:21
accton
root ttyp2 0.01 secs Sun Jan 25 16:21
The sa command is used to provide more detailed summaries of the information stored by process accounting and also to summarise the information into other files.
[root@beldin /proc]# /usr/sbin/sa
-a
66 0.19re
0.25cp
6 0.01re
0.16cp cat
8 0.00re
0.04cp lastcomm
17 0.00re
0.01cp ls
6 0.01re
0.01cp man
1 0.00re
0.01cp troff
5 0.01re
0.01cp less
1 0.15re
0.01cp in.ftpd
6 0.01re
0.01cp sh
5 0.00re
0.00cp gunzip
1 0.00re
0.00cp grotty
2 0.00re
0.00cp sa
1 0.00re
0.00cp groff
1 0.00re
0.00cp gtbl
1 0.00re
0.00cp gzip
1 0.00re
0.00cp sh*
1 0.00re
0.00cp netscape*
1 0.00re
0.00cp accton
2 0.00re
0.00cp bash*
Refer to the manual pages for the sa command for more information.
This section has given a very brief overview of process and login accounting and the associated commands and files. What use do these systems fulfil for a Systems Administrator? The main one is that they allow you to track what is occurring on your system and who is doing it. This can be useful for a number of reasons
§ tracking which user's are abusing the system
§
figuring out what is normal for a user
If you know that most of your users never use commands like sendmail and the C
compilers (via process accounting) and then all of a sudden they start using
this might be an indication of a break in.
§
justifying to management the need for a larger system
Generally management won't buy you a bigger computer just because you want one.
In most situations you will have to put together a case to justify why
the additional expenditure is necessary. Process
and login account could provide some of the necessary information.
The cron system is used to automatically perform tasks at set times. Components of the cron system include
§
the daemon, crond,
Which actually performs the specified tasks.
§
crontab files,
and
That specify the when and what.
§
the crontab
command.
Used to manipulate the crontab
files.
Useful commands for examining the current status of your systems file system include df and du. Commands for examining and manipulating processes include ps, kill, renice, nice and top. Other "status" commands include free, uptime and uname.
syslog is a centralised system for logging information about system events. It's components include
§ an API and a program (logger) by which information can be logged,
§ the syslogd daemon that actually performs the logging, and
§ the /etc/syslog.conf that specifies what and where logging information should be logged.
Login accounting is used to track when, where and for how long users connect to your system. Process accounting is used to track when and what commands were executed. By default Linux does not provide full support for either form of accounting (it does offer some standard login accounting but not the extra command sac). However there are freely available software distributions that provide Linux this functionality.
Login accounting is performed in the /var/log/wtmp file that is used to store the details of every login and logout from the system. The last command can be used to view the contents of the binary /var/log/wtmp file. The non-standard command sac can be used to summarise this information into a number of useful formats.
Process accounting must be turned on using the accton command and the results can be viewed using the lastcomm command.
Both logging and accounting can produce files that grow to some considerable size in a short amount of time. The Systems Adminstrator must implement strategies to deal with these log files. Either by ignoring and deleting them or by saving them to tape.
Explain the relationship between each of the following
§ crond, crontab files and the crontab command,
§ syslogd, logger and /etc/syslog.conf
§ /var/adm/wtmp, last and sac
You have just modified the /etc/syslog.conf file. Will your changes take effect immediately? If not what command would you use to make the modifications take effect? How could you check that the modifications are working?
Write crontab entries to achieve the following
§ run the script /usr/local/adm/bin/archiveIt every Monday at 6 am
§ run a script /usr/local/adm/bin/diskhog on Monday, Wednesday and Friday at 6am, 12pm, 4pm