Chapter 5
Processes and Files

Introduction

This chapter introduces the important and related UNIX concepts of processes and files.

A process is basically an executing program. All the work performed by a UNIX system is carried out by processes. The UNIX operating system stores a great deal of information about processes and provides a number of mechanisms by which you can manipulate both the files and the information about them.

All the long term information stored on a UNIX system, like most computers today, is stored in files which are organised into a hierarchal directory structure. Each file on a UNIX system has a number of attributes that serve different purposes. As with processes there are a collection of commands which allow users and Systems Administrators to modify these attributes.

Among the most important attributes of files and processes examined in this chapter are those associated with user identification and access control.  Since UNIX is a multiuser operating system it must provide mechanisms which restrict what and where users (and their processes) can go.  An understanding of how this is achieved is essential for a Systems Administrator.

Multiple users

UNIX is a multi-user operating system. This means that at any one time there are multiple people all sharing the computer and its resources. The operating system must have some way of identifying the users and protecting one user's resources from the other users.

Identifying users

Before you can use a UNIX computer you must first log in. The login process requires that you have a username and a password. By entering your username you identify yourself to the operating system.


Users and groups

In addition to a unique username UNIX also places every user into at least one group. Groups are used to provide or restrict access to a collection of users and are specified by the /etc/group file.

To find out what groups you are a member of use the groups command. It is possible to be a member of more than one group.

Names and numbers

As you've seen each user and group has a unique name. However the operating system does not use these names internally. The names are used for the benefit of the human users.

For its own purposes the operating system actually uses numbers to represent each user and group (numbers are more efficient to store). This is achieved by each username having an equivalent user identifier (UID) and every group name having an equivalent group identifier (GID).

The association between username and UID is stored in the /etc/passwd file. The association between group name and GID is stored in the /etc/group file.

To find out the your UID and initial GID try the following command

grep username /etc/passwd

Where username is your username. This command will display your entry in the /etc/passwd file. The third field is your UID and the fourth is your initial GID. On my system my UID is 500 and my GID is 100.

bash$  grep david /etc/passwd
david:*:500:100:David Jones:/home/david:/bin/bash           

id

The id command can be used to discover username, UID, group name and GID of any user.

dinbig:~$ id
uid=500(david) gid=100(users) groups=100(users)
dinbig:~$ id root
uid=0(root) gid=0(root) groups=0(root),1(bin),
2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy) 

In the above you will see that the user root is a member of more than one group. The entry in the /etc/passwd file stores the GID of the users initial group (mine is 100, root's is 0). If a user belongs to any other groups they are specified in the /etc/group file.


Commands and processes

Whenever you run a program, whether it is by typing in at the command line or running it from X-Windows, a process is created.  It is the process, a program in execution and a collection of executable code, data and operating system data structures, which perform the work of the program.

The UNIX command line that you use to enter commands is actually another program/command called the shell. The shell is responsible for asking you for a command and then attempting to execute the command. (The shell also performs a number of other tasks which are discussed in the next chapter).

Where are the commands?

For you to execute a command, for example ls, that command must be in one of the directories in your search path. The search path is a list of directories maintained by the shell.

When you ask the shell to execute a command it will look in each of the directories in your search path for a file with the same name as the command. When it finds the executable program it will run it. If it doesn't find the executable program it will report command_name: not found.

which

Linux and most UNIX operating systems supply a command called which. The purpose of this command is to search through your search path for a particular command and tell you where it is.

For example, the command which ls on my machine aldur returns /usr/bin/ls. This means that the program for ls is in the directory /usr/bin.

 

Exercises

5.1         Use the which command to find the locations of the following commands
ls
echo
set

When is a command not a command?

In the previous exercise you will have discovered that which could not find the set command. How can this be possible? Enter the set command. Does it work? Why can't which find it?

This is because set is a built-in shell command. This means there isn't an executable program that contains the code for the set command. Instead the code for set is actually built into the shell.

Controlling processes

Controlling Processes

 

The resource materials section for Week 2 (on the 85321 Web site and CD-ROM) has a reading on controlling processes.

Exercises

5.2         Under the VMS operating system it is common to use the key combination CTRL-Z to exit a program. A new user on your UNIX system has been using VMS a lot. What happens when he uses CTRL-Z while editing a document with vi?

Process attributes

For every process that is created the UNIX operating system stores information including

§         its real UID, GID and its effective UID and GID

§         the code and variables used by the process (its address map)

§         the status of the process

§         its priority

§         its parent process

Parent processes

All processes are created by another process (its parent). The creation of a child process is usually a combination of two operations

§         forking
A new process is created that is almost identical to the parent process. It will be using the same code.

§         exec
This changes the code being used by the process to that of another program.

When you enter a command it is the shell that performs these tasks. It will fork off a new process (which is running the shell's program). The child process then performs an exec to change to the code for the command you wish executed.

While your command is executing the shell will block until its child has completed. When the child dies the shell will present you with another prompt and wait for a new command.


Process UID and GID

In order for the operating system to know what a process is allowed to do it must store information about who owns the process (UID and GID). The UNIX operating system stores two types of UID and two types of GID.

Real UID and GID

A process' real UID and GID will be the same as the UID and GID of the user who ran the process. Therefore any process you execute will have your UID and GID.

The real UID and GID are used for accounting purposes.

Effective UID and GID

The effective UID and GID are used to determine what operations a process can perform. In most cases the effective UID and GID will be the same as the real UID and GID.

However using special file permissions it is possible to change the effective UID and GID. How and why you would want to do this is examined later in this chapter.

Exercises

5.3         Create a text file called i_am.c that contains the following C program. Compile the program by using the following command
cc i_am.cc -o i_am
This will produce an executable program called i_am.
Run the program. (rather than type the code, you should be able to cut and paste it from the online versions of this chapter that are on the CD-ROM and Web site)
#include <stdio.h>
#include <unistd.h>

void main()
{
  int real_uid, effective_uid;
  int real_gid, effective_gid;

  /* get the user id and group id*/
  real_uid = getuid();
  effective_uid = geteuid();
  real_gid = getgid();
  effective_gid = getegid();

  /* display what I found */
  printf( "The real uid is %d\n", real_uid );
  printf("The effective uid is %d\n", effective_uid );
  printf("The real gid is %d\n", real_gid );
  printf("The effective gid is %d\n", effective_gid );
}                   


Files

All the information stored by UNIX onto disk is stored in files. Under UNIX even directories are just special types of files. A previous reading has already introduced you to the basic UNIX directory hierarchy. The purpose of this section is to fill in some of the detail.

File types

UNIX supports a small number of different file types. The following table summarises these different file types. What the different file types are and what their purpose is will be explained as we progress. File types are signified by a single character.

 

File type

Meaning

-

a normal file

d

a directory

l

symbolic link

b

block device file

c

character device file

p

a fifo or named pipe

Table 5.1
UNIX file types

For current purposes you can think of these file types as falling into three categories

§         “normal” files,
Files under UNIX are just a collection of bytes of information.  These bytes might form a text file or a binary file.

§         directories or directory files,
Remember, for UNIX a directory is just another file which happens to contain the names of files and their I-node.  An I-node is an operating system data structure which is used to store information about the file (explained later).

§         special or device files.
Explained in more detail later on in the text these special files provide access to devices which are connected to the computer.  Why these exist and what they are used for will be explained.

Types of normal files

Quite obviously it is possible to have different types of normal files based on the data they contain.  You can have text files, executable files, sound files and images.  If you’re unsure what type of normal file you have the UNIX file command might help.

[david@beldin david]$ file /demo_1.au /etc/passwd /usr/bin/file
demo_1.au:     Sun/NeXT audio data: 8-bit ISDN u-law, mono, 8000 Hz
/etc/passwd:   ASCII text
/usr/bin/file: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked, stripped

In this example the file command has been used to discover what type of file three files are.  The three files here are audio, text and executable files respectively.

How does this work? 

The file command looks for a magic number inside a data file.  If the file contains a certain magic number then it must be a certain type of file.  The magic numbers and the corresponding file description is contained in a text data file.  On RedHat system you should find this information in the file /usr/lib/magic .

Exercises

5.4         Examine the contents of the /usr/lib/magic file.  Experiment with the file command on a number of different files.

File attributes

UNIX stores a variety of information about each file including

§         where the file's data is stored on the disk

§         what the file's name is

§         who owns the file

§         who is allowed to do what with the file

§         how big the file is

§         when was the file last modified

§         how many links there are to the file

UNIX uses a data structure called an inode to store all of this information (except for the filename). Every file on a UNIX system must have an associated inode. You can find out which inode a file has by using the ls -i command.

dinbig:~$ ls -i README
  45210 README   

In the above example the file README is using inode 45210.

As mentioned previously, the name of a file is actually stored in the directory in which it appears. Throughout this text you will find the term file used to mean both files and directories.

Viewing file attributes

To examine the various attributes associated with a file you can use the -l switch of the ls command.


 


Figure 5.1
File Attributes

Filenames

Most UNIX file systems (including the Linux file system) will allow filenames to be 255 characters long and use almost any characters. However there are some characters that can cause problems if used including * $ ? ' " / \ - and others. Why is explained in the next chapter.  This doesn’t mean you can’t create filenames that contain these characters, just that you can have some problems if you do.

Size

The size of a file is specified in bytes. So the above file is 227 bytes long. The standard Linux file system will allow files to be up to 4TB (terra bytes) in size.

Date

The date specified here is the date the file was last modified.

Permissions

The permission attributes of a file specifies what operations can be done with a file and who can perform those operations.  Permissions are explained in more detail in the following section.


Exercises

5.5         Execute the following command ls -ld / /dev (it produces a long listing of the directories / and /dev). Why is the /dev directory bigger than the / directory?

5.6         Execute the following commands (double the number of times the letter 'a' appears in the filename for the touch command)
  ls –ld /tmp
  for name in 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  do
    touch /tmp/aaaaaaaaaaaaaaaaaaaaaaaaaaaa$name
  done
  ls -ld /tmp
These commands create a number of empty files inside the /tmp directory. (The touch command is used to create an empty file if the file doesn't exist, or updates the date last modified if it does.)
Why does the output of the ls -ld /tmp command change?

File protection

Given that there can be many people sharing a UNIX computer it is important that the operating system provide some method of restricting access to files. I don't want you to be able to look at my personal files.

UNIX achieves this by

§         restricting users to three valid operations,
Under UNIX there are only three things you can do to a file (or directory): read, write or execute it.

§         allow the file owner to specify who can do these operations on a file.
The file owner can use the user and group concepts of UNIX to restrict which users (actually it restricts which processes that are owned by particular users) can perform these tasks.

File operations

UNIX provides three basic operations that can be performed on a file or a directory. The following table summarises those operations.

It is important to recognise that the operations are slightly different depending whether they are being applied to a file or a directory.

Operation

Effect on a file

Effect on a directory

read

read the contents of the file

find out what files are in the directory, e.g. ls

write

delete the file or add something to the file

be able to create or remove a file from the directory

execute

be able to run a file/program

be able to access a file within a directory

Table 5.2
UNIX file operations

Users, groups and others

Processes wishing to access a file on a UNIX computer are placed into one of three categories

§         user
The individual user who owns the file (by default the user that created the file but this can be changed). In figure 5.1 the owner is the user david.

§         group
The collection of people that belong to the group that owns the file (by default the group to which the file's creator belongs). In figure 5.1 the group is staff.

§         other
Anybody that doesn't fall into the first two categories.

File permissions

Each user category (user, group and other) have their own set of file permissions. These control what file operation each particular user category can perform.

File permissions are the first field of file attributes to appear in the output of ls -l. File permissions actually consist of four fields

§         file type,

§         user permissions,

§         group permissions,

§        

and other permissions.

 


Figure 5.2

File Permissions

Three sets of file permissions

As the diagram shows the file permissions for a file are divided into three different sets one for the user, one for a group which owns the file and one for everyone else.

A letter indicates that the particular category of user has permission to perform that operation on the file. A - indicates that they can't.

In the above diagram the owner can read, write and execute the file (rwx). The group can read and write the file (rw-), while other cannot do anything with the file (---).

Symbolic and numeric permissions

rwxr-x-w- is referred to as symbolic permissions. The permissions are represented using a variety of symbols.

There is another method for representing file permissions called numeric or absolute permissions where the file permissions are represented using numbers.

Symbols

The following table summarises the symbols that can be used in representing file permissions using the symbolic method.

Symbol

Purpose

r

read

w

write

x

execute

s

setuid or setgid (depending on location)

t

sticky bit

Table 5.3
Symbolic file permissions

Special permissions

Table 5.3 introduced three new types of permission setuid, setgid and the sticky bit.

Sticky bit on a file

In the past having the sticky bit set on a file meant that when the file was executed the code for the program would "stick" in RAM. Normally once a program has finished its code was taken out of RAM and that area used for something else.

The sticky bit was used on programs that were executed regularly. If the code for a program is already in RAM the program will start much quicker because the code doesn't have to be loaded from disk.

However today with the advent of shared libraries and cheap RAM most modern Unices ignore the sticky bit when it is set on a file.


Sticky bit on a directory

The /tmp directory on UNIX is used by a number of programs to store temporary files regardless of the user. For example when you use elm (a UNIX mail program) to send a mail message, while you are editing the message it will be stored as a file in the /tmp directory.

Modern UNIX operating systems (including Linux) use the sticky bit on a directory to make /tmp directories more secure. Try the command ls -ld /tmp what do you notice about the file permissions of /tmp.

If the sticky bit is set on a directory you can only delete or rename a file in that directory if you are

§         the owner of the directory,

§         the owner of the file, or

§         the super user

Changing passwords

When you use the passwd command to change your password the command will actually change the contents of either the /etc/passwd or /etc/shadow files. These are the files where your password is stored. By default most Linux systems use /etc/passwd

As has been mentioned previously the UNIX operating system uses the effective UID and GID of a process to decide whether or not that process can modify a file. Also the effective UID and GID are normally the UID and GID of the user who executes the process.

This means that if I use the passwd command to modify the contents of the /etc/passwd file (I write to the file) then I must have write permission on the /etc/passwd file. Let's find out.

What are the file permissions on the /etc/passwd file?

dinbig:~$ ls -l /etc/passwd
-rw-r--r--   1 root     root          697 Feb  1 21:21 /etc/passwd  

On the basis of these permissions should I be able to write to the /etc/passwd file?

No. Only the user who owns the file, root, has write permission. Then how do does the passwd command change my password?

setuid and setgid

This is where the setuid and setgid file permissions enter the picture. Let's have a look at the permissions for the passwd command (first we find out where it is).

dinbig:~$ which passwd
/usr/bin/passwd
dinbig:~$ ls -l /usr/bin/passwd
-rws--x--x   1 root     bin          7192 Oct 16 06:10 /usr/bin/passwd
 

Notice the s symbol in the file permissions of the passwd command, this specifies that this command is setuid.

The setuid and setgid permissions are used to change the effective UID and GID of a process. When I execute the passwd command a new process is created. The real UID and GID of this process will match my UID and GID. However the effective UID and GID (the values used to check file permissions) will be set to that of the command.

In the case of the passwd command the effective UID will be that of root because the setuid permission is set, while the effective GID will be my group's because the setgid bit is not set.

Exercises

5.7         Log in as the root user, go to the directory that contains the file i_am you created in exercise 5.3. Execute the following commands
  cp i_am i_am_root
  cp i_am i_am_root_group
  chown root.root i_am_root*
  chmod a+rx i_am*
  chmod u+s i_am_root
  chmod +s i_am_root_group
  ls -l i_am*      
These commands make copies of the i_am program called
i_am_root with setuid set, and i_am_root_group with setuid and setgid set. Log back in as your normal user and execute all three of the i_am programs. What do you notice? What is the UID and gid of root?

Numeric permissions

Up until now we have been using symbols like r w x s t to represent file permissions. However the operating system itself doesn't use symbols, instead it uses numbers. When you use symbolic permissions, the commands translate between the symbolic permission and the numeric permission.

With numeric or absolute permissions the file permissions are represented using octal (base 8) numbers rather than symbols. The following table summarises the relationship between the symbols used in symbolic permissions and the numbers used in numeric permissions.

To obtain the numeric permissions for a file you add the numbers for all the permissions that are allowed together.


 

Symbol

Number

s

4000 setuid 2000 setgid

t

1000

r

400 user 40 group 4 other

w

200 user 20 group 2 other

x

100 user 10 group 1 other

Table 5.4
Numeric file permissions

Symbolic to numeric

Here's an example of converting from symbolic to numeric using a different method. This method relies on using binary numbers to calculate the numeric permissions.

The process goes something like this

§         write down the symbolic permissions,

§         under each permission that is on, write a one

§         under each permission that is off, write a zero

§         for each category of user, user, group and other convert the three binary digits into decimal, e.g. rwx -> 111 -> 7

§         combine the three numbers (one each for user, group and other) into a single octal number


 

Figure 5.3

Symbolic to Numeric permissions

Exercises

5.8         Convert the following symbolic permissions to numeric
rwxrwxrwx
---------
---r--r--
r-sr-x---
rwsrwsrwt


5.9         Convert the following numeric permissions to symbolic
710
4755
5755
6750
7000

Changing file permissions

The UNIX operating system provides a number of commands for users to change the permissions associated with a file. The following table provides a summary.

Command

Purpose

chmod

change the file permissions for a file

set the default file permissions for any files to be created. Usually run as the user logs in.

change the group owner of a file

change the user owner of a file.

Table 5.5
Commands to change file ownership and permissions

Changing permissions

The chmod command is used to the change a file's permissions. Only the user who owns the file can change the permissions of a file (the root user can also do it).

Format

chmod [-R] operation files

The optional (the [ ] are used to indicate optional) switch -R causes chmod to recursively descend any directories changing file permissions as it goes.

files is the list of files and directories to change the permissions of.

operation indicates how to change the permissions of the files. operation can be specified using either symbolic or absolute permissions.

Numeric permissions

When using numeric permissions operation is the numeric permissions to change the files permissions to.  For example

chmod 770 my.file
     

will change the file permissions of the file my.file to the numeric permissions 770.


Symbolic permissions

When using symbolic permissions operation has three parts who op symbolic_permission where

§         who specifies the category of user to change the permissions for
It can be any combination of u for user, g for group, o for others and a for all categories.

§         op specifies how to change the permissions
+ add permission, - remove permission, = set permission

§         permission specifies the symbolic permissions
r for read, w for write, x execute, s set uid/gid, t set sticky bit.

Examples

§         chmod u+rwx temp.dat
add rwx permission for the owner of the file, these permissions are added to the existing permissions

§         chmod go-rwx temp.dat
remove all permissions for the group and other categories

§         chmod -R a-rwx /etc
turn off all permissions, for all users, for all files in the /etc directory.

§         chmod -R a= /
turn off all permissions for everyone for all files

§         chmod 770 temp.dat
allow the user and group read, write and execute and others no access

Changing owners

The UNIX operating system provides the chown command so that the owner of a file can be changed. However in most Unices only the root user can use the command.

Two reasons why this is so are

§         in a file system with quotas (quotas place an upper limit of how many files and how much disk space a user can use) a person could avoid the quota system by giving away the ownership to another person

§         if anyone can give ownership of a file to root they could create a program that is setuid to the owner of the file and then change the owner of the file to root

Changing groups

UNIX also supplies the command chgrp to change the group owner of a file. Any user can use the chgrp command to change any file they are the owner of. However you can only change the group owner of a file to a group to which you belong.


For example

dinbig$ whoami
david
dinbig$ groups
users
dinbig$ ls -l tmp
-rwxr-xr-x 2 david users 1024 Feb 1 21:49 tmp
dinbig$ ls -l /etc/passwd
dinbig$ chgrp users /etc/passwd
chgrp: /etc/passwd: Operation not permitted
-rw-r--r-- 1 root root 697 Feb 1 21:21 /etc/passwd
dinbig$ chgrp man tmp
chgrp: you are not a member of group `man': Operation not permitted

In this example I've tried to change the group owner of /etc/passwd. This failed because I am not the owner of that file.

I've also tried to change the group owner of the file tmp, of which I am the owner, to the group man. However I am not a member of the group man so it has also failed.

The commands

The commands chown and chgrp are used to change the owner and group owner of a file.

Format

 chown [-R] owner files
 chgrp [-R] group files

The optional switch -R works in the same was as the -R switch for . It modifies the command so that it descends any directories and performs the command on those sub-directories and files in those sub-directories.

owner is either a numeric user identifier or a username.

group is either a numeric group identifier or a group name.

files is a list of files of which you wish to change the ownership.

Some systems (Linux included) allow owner in the chown command to take the format owner.group. This allows you to change the owner and the group owner of a file with one command.

Examples

§         chown david /home/david
Change the owner of the directory /home/david to david. This demonstrates one of the primary uses of the chown command. When a new account is created the root user creates a number of directories and files. Since root created them they are owned by root. In real life these files and directories should be owned by the new username.

§         chown -R root /
Change the owner of all files to root.

§         chown david.users /home/david
Change the ownership of the file /home/david so that it is owned by the user david and the group users.

§         chgrp users /home/david
Change the group owner of the directory /home/david to the group users.

Default permissions

When you create a new file it automatically receives a set of file permissions.

dinbig:~$ touch testing
dinbig:~$ ls -l testing
-rw-r--r--   1 david    users           0 Feb 10 17:36 testing
 

In this example the file testing has been given the default permissions rw-r--r--. Any file I create will receive the same default permissions.

umask

The built-in shell command umask is used specify and view what the default file permissions are. Executing the umask command without any arguments will cause it to display what the current default permissions are.

dinbig:~$ umask
022          

By default the umask command uses the numeric format for permissions. It returns a number which specifies which permissions are turned off when a file is created.

In the above example

§         the user has the value 0
This means that by default no permissions are turned off for the user.

§         the group and other have the value 2
This means that by default the write permission is turned off.

You will notice that the even though the execute permission is not turned off my default file doesn't have the execute permission turned on. I am not aware of the exact reason for this.

umask versions

Since umask is a built-in shell command the operation of the umask command will depend on the shell you are using. This also means that you'll have to look at the man page for your shell to find information about the umask command.


umask for bash

The standard shell for Linux is bash. The version of umask for this shell supports symbolic permissions as well as numeric permissions. This allows you to perform the following.

dinbig:~$ umask -S
u=rwx,g=r,o=r
dinbig:~$ umask u=rw,g=rw,o=
dinbig:~$ umask -S
u=rw,g=rw,o=  

Exercises

5.10      Use the umask command so that the default permissions for new files are set to   rw-------    772

File permissions and directories

As shown in table 5.2 file permissions have a slightly different effect on directories than they do on files.

The following example is designed to reinforce your understanding of the effect of file permissions on directories.

For example

Assume that

§         I have an account on the same UNIX machine as you

§         we belong to different groups

§         I want to allow you to access the text for assignment one

§         I want you to copy your finished assignments into my directory

§         But I don't want you to see anything else in my directories


The following diagram represents part of my directory hierarchy including the file permissions for each directory.

Figure 5.4
Permissions and Directories


What happens if?

What happens if you try the following commands

§         ls -l david
To perform an ls you must have read permission on the directory. In this case you don't. Only myself, as the owner of the file has read permission, so only I can obtain a listing of the files in my directory.

§         cat david/phone.book
You’re trying to have a look at my phone book but you can't. You have permission to do things to files in my directory because you have execute permission on the directory david. However the permissions on the phone.book file mean that only I can read it. The same things occurs if you try the command cp david/phone.book ~/phone.book. To the file system you are trying to do the same thing, read the file phone.book.

§         ls david/85321 The permissions are set up so you can get a listing of the files in the david/85321 directory. Notice you have read permission on the 85321 directory.

§         cat david/85321/assign.txt
Here you're trying to have a look at the assignment text. This will work. You have read permission on the file so you can read it. You have execute permission on the directories 85321 and david which means you can gain access to files and directories within those directories (if the permissions on the files let you).

§         cat david/85321/solutions/assign1.sol
Trying to steal a look at the solutions? Well you have the permissions on the file to do this. But you don't have the permissions on the directory solutions so this will fail.
What would happen if I executed this command

§          chmod o+r david/85321/solutions
This would add read permission for you to the directory solutions. Can you read the assign1.sol file now? No you can't. To read the file or do anything with a file you must have execute permission on the directory it is in.

§         cp my.assign david/85321/assign.txt
What's this? Trying to replace my assignment with one of your own? Will this work? No because you don't have write permission for the file assign.txt.

Links

Hard and soft links

 

A reading describing links, both hard and soft, is included on the 85321 Web site/CD-ROM under the resource materials section for week 2.


Searching the file hierarchy

A common task for a Systems Administrator is searching the UNIX file hierarchy for files which match certain criteria.  Some common examples of what and why a Systems Administrator may wish to do this include

§         searching for very large files

§         finding where on the disk a particular file is

§         deleting all the files owned by a particular user

§         displaying the names of all files modified in the last two days.

Given the size of the UNIX file hierarchy and the number of files it contains this isn’t a task that can be done by hand.  This is where the find command becomes useful.

The find command

The find command is used to search through the directories of a file system looking for files that match a specific criteria.  Once a file matching the criteria is found the find command can be told to perform a number of different tasks including running any UNIX command on the file.

find command format

The format for the find command is

find [path-list] [expression] 

path-list is a list of directories in which the find command will search for files. The command will recursively descend through all sub-directories under these directories.  The expression component is explained in the next section. 

Both the path and the expression are optional. If you run the find command without any parameters it uses a default path, the current directory, and a default expression, print the name of the file. The following is an example of what happens

dinbig:~$ find
.
./iAm
./iAm.c
./parameters
./numbers
./pass
./func
./func2
./func3
./pattern
./Adirectory
./Adirectory/oneFile
     

The default path is the current directory. In this example the find command has recursively searched through all the directories within the current directory.

The default expression is -print. This is a find command that tells the find command to display the name of all the files it found.

Since there was no test specified the find command matched all files.

find expressions

A find expression can contain the following components

§         options,
These modify the way in which the find command operates.

§         tests,
These decide whether or not the current file is the one you are looking for.

§         actions,
Specify what to do once a file has been selected by the tests.

§         and operators.
Used to group expressions together.

find options

Options are normally placed at the start of an expression. Table 5.6 summarises some of the find commands options.

Option

Effect

-daystart

for tests using time measure time from the beginning of today

-depth

process the contents of a directory before the directory

-maxdepth number

number is a positive integer that specifies the maximum number of directories to descend

-mindepth number

number is a positive integer that specifies at which level to start applying tests

-mount

don't cross over to other partitions

-xdev

don't cross over to other partitions

Table 5.6
find options

For example

The following are two examples of using find's options. Since I don't specify a path in which to start searching the default value, the current directory, is used.

dinbig:~$ find -mindepth 2
./Adirectory/oneFile

In this example the mindepth option tells find to only find files or directories which are at least two directories below the starting point.

dinbig:~$ find -maxdepth 1
.
./iAm
./iAm.c
./parameters
./numbers
./pass
./func
./func2
./func3
./pattern
./Adirectory

This option restricts find to those files which are in the current directory.

find tests

Tests are used to find particular files based on

§         when the file was last accessed

§         when the file's status was last changed

§         when the file was last modified

§         the size of the file

§         the file's type

§         the owner or group owner of the file

§         the file's name

§         the file's inode number

§         the number and type of links the file has to it

§         the file's permissions

Table 5.7 summarises find's tests. A number of the tests take numeric values.  For example, the number of days since a file was modified.  For these situations the numeric value can be specified using one of the following formats (in the following n is a number)

§         +n
greater than n

§         -n
less than n

§         n
equal to n

For example

Some examples of using tests are shown below. Note that in all these examples no command is used. Therefore the find command uses the default command which is to print the names of the files.

§         find . -user david
Find all the files under the current directory owned by the user david

§         find / -name \*.html
Find all the files one the entire file system that end in .html. Notice that the * must be quoted so that the shell doesn't interpret it (explained in more detail below).  Instead we want the shell to pass the *.html to the find command and have it match filenames.

§         find /home -size +2500k -mtime -7
Find all the files under the /home directory that are greater than 2500 kilobytes in size and have been in modified in the last seven days.

The last example shows it is possible to combine multiple tests. It is also an example of using numeric values. The +2500 will match any value greater than 2500. The -7 will match any value less than 7.


 

Shell special characters

The shell is the program which implements the UNIX command line interface at which you use these commands.  Before executing commands the shell looks for special characters.  If it finds any it performs some special operations.  In some cases, like the previous command, you don't want the shell to do this.  So you quote the special characters.  This process is explained in more detail in the following chapter.

 

Test

Effect

-amin n

file last access n minutes ago

-anewer file

the current file was access more recently than file

-atime n

file last accessed n days ago

-cmin n

file's status was changed n minutes ago

-cnewer file

the current file's status was changed more recently than file's

-ctime n

file's status was last changed n days ago

-mmin n

file's data was last modified n minutes ago

-mtime n

the current file's data was modified n days ago

-name pattern

the name of the file matches pattern  -iname is a case insensitive version of –name   -regex allows the use of REs to match filename

-nouser-nogroup

the file's UID or GID does not match a valid user or group

-perm mode

the file's permissions match mode (either symbolic or numeric)

-size n[bck]

the file uses n units of space, b is blocks, c is bytes, k is kilobytes

-type c

the file is of type c where c can be block device file, character device file, directory, named pipe, regular file, symbolic link, socket

-uid n -gid n

the file's UID or GID matches n

-user uname

the file is owned by the user with name uname

Table 5.7
find tests

find actions

Once you've found the files you were looking for you want to do something with them. The find command provides a number of actions most of which allow you to either

§         execute a command on the file, or

§         display the name and other information about the file in a variety of formats

For the various find actions that display information about the file you are urged to examine the manual page for find


Executing a command

find has two actions that will execute a command on the files found. They are -exec and -ok.

The format to use them is as follows

-exec command ;
-ok command ;

command is any UNIX command.

The main difference between exec and ok is that ok will ask the user before executing the command. exec just does it.

For example

Some examples of using the exec and ok actions include

§         find . -exec grep hello \{\} \;
Search all the files under the local directory for the word hello.

§         find / -name \*.bak -ok rm \{\} \;
Find all files ending with .bak and ask the user if they wish to delete those files.

{} and ;

The exec and ok actions of the find command make special use of {} and ; characters. Since both {} and ; have special meaning to the shell they must be quoted when used with the find command.

{} is used to refer to the file that find has just tested. So in the last example rm \{\} will delete each file that the find tests match.

The ; is used to indicate the end of the command to be executed by exec or ok.

Exercises

5.11      As was mentioned above the {} and ; used in the exec and ok actions of the find command must be quoted.
As a group decide why the following command doesn't work.
find . -name \*.bak -ok rm '{} ;'

5.12      Use find to print the names of every file on your file system that has nothing in it find where the file XF86Config is


Performing commands on many files

Every UNIX command you execute requires a new process to be created.  Creating a new process is a fairly heavyweight procedure for the operating system and can take quite some time.  When you are performing a task it can save time if you minimise the number of new processes which are created.

It is common for a Systems Administrator to want to perform some task which requires a large number of processes.  Some uses of the find command offer a good example. 

For example

Take the requirement to find all the HTML files on a Web site which contain the word expired. There are at least three different ways we can do this

§         using the find command and the -exec switch,

§         using the find command and back quotes ``,

§         using the find command and the xargs command.

In the following we'll look at each of these.

More than one way to do something

One of the characteristics of the UNIX operating system is that there is always more than one way to perform some task.

find and -exec

We'll assume the files we are talking about in each of these examples are contained in the directory /usr/local/www

find /usr/local/www -name \*.html -exec grep -l expired \{\} \;

The -l switch of grep causes it to display the filename of any file in which it finds a match. So this command will list the names of all the files containing expired.

While this works there is a slight problem, it is inefficient. These commands work as follows

§         find searches through the directory structure,

§         everytime it finds a file that matches the test (in this example that it has the extension html) it will run the appropriate command

§         the operating system creates a new process for the command,

§         once the command has executed for that file it dies and the operating system must clean up,

§         now we restart at the top with find looking for the appropriate file

On any decent Web site it is possible that there will be tens and even hundreds of thousands of HTML files.  This implies that this command will result in hundreds of thousands of processes being created.  This can take quite some time.

find and back quotes

A solution to this is to find all the matching files first, and then pass them to a single grep command.

grep -l expired `find /usr/local/www -name \*.html`

In this example there are only two processes created.  One for the find command and one for the grep.

Back quotes

Back quotes `` are an example of the shell special characters mentioned previously.  When the shell sees `` characters it knows it must execute the command enclosed by the `` and then replace the command with the output of the command.

In the above example the shell will execute the find command which is enclosed by the `` characters.  It will then replace the `find /usr/local/www -name \*.html` with the output of the command.  Now the shell executes the grep command.

Back quotes are explained in more detail in the next chapter.

To show the difference that this makes you can use the time command. time is used to record how long it takes for a command to finish (and a few other stats). The following is an example from which you can see the significant difference in time and resources used by reducing the number of processes.

beldin:~$ time grep -l expired `find 85321/* -name index.html`
0.04user 0.22system 0:02.86elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+0minor)pagefaults 0swaps
beldin:~$ time find 85321/* -name index.html -exec grep -l expired \{\} \;
1.33user 1.90system 0:03.55elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+0minor)pagefaults 0swaps

 

The time command can also report a great deal more information about a process and its interaction with the operating system.  Especially if you use the verbose option (time –v some_command)

find and xargs

While in many cases the combination of find and back quotes will work perfectly, this method has one serious drawback as demonstrated in the following example.

beldin:~$ grep -l expired `find 85321/* -name \*`
bash: /usr/bin/grep: Arg list too long

The problem here is that a command line can only be so long. In the above  example the find command found so many files that the names of these files exceeded the limit.

This is where the xargs command enters the picture.

Rather than pass the list of filenames as a parameter to the command, xargs allows the list of filenames to be passed as standard input (standard input is explained in more detail in a following chapter). This means we side-step the problem of exceeding the number of parameters.

Have a look at the man page for xargs for more information. Here is the example rewritten to use xargs

find /usr/local/www -name \* | xargs grep -l expired

There are now three processes created, find, xargs and grep.  However it does avoid the problem of the argument list being too long.

Conclusion

UNIX is a multi-user operating system and as such must provide mechanisms to uniquely identify users and protect the resources of one user from other users. Under UNIX users are uniquely identified by a username and a user identifier (UID). The relationship between username and UID is specified in the /etc/passwd file.

UNIX also provides the ability to collect users into groups. A user belongs to at least one group specified in the /etc/passwd file but can also belong to other groups specified in the /etc/group file. Each group is identified by both a group name and a group identifier (GID). The relationship between group name and GID is specified in the /etc/group file.

All work performed on a UNIX computer is performed by processes. Each process has a real UID/GID pair and an effective UID/GID pair. The real UID/GID match the UID/GID of the user who started the process and are used for accounting purposes. The effective UID/GID are used for deciding the permissions of the process. While the effective UID/GID are normally the same as the real UID/GID it is possible using the setuid/setgid file permissions to change the effective UID/GID so that it matches the UID and GID of the file containing the process' code.

The UNIX file system uses a data structure called an inode to store information about a file including file type, file permissions, UID, GID, number of links, file size, date last modified and where the files data is stored on disk. A file's name is stored in the directory which contains it.

A file's permissions can be represented using either symbolic or numeric modes. Valid operations on a file include read, write and execute. Users wishing to perform an operation on a file belong to one of three categories the user who owns the file, the group that owns the file and anyone (other) not in the first two categories.

A file's permissions can only be changed by the user who owns the file and are changed using the chmod command. The owner of a file can only be changed by the root user using the chown command. The group owner of a file can be changed by root user or by the owner of the file using the chgrp command. The file's owner can only change the group to another group she belongs to.

Links both hard and soft are mechanisms by which more than one filename can be used to refer to the same file.

Review Questions

5.1 For each of the following commands indicate whether they are built-in shell commands, "normal" UNIX commands or not valid commands. If they are "normal" UNIX commands indicate where the command's executable program is located.

§         alias

§         history

§         rename

§         last

 

5.2 How would you find out what your UID, GID and the groups you currently belong to?

 

5.3 Assume that you are logged in with the username david and that your current directory contains the following files

bash# ls –il

total 2
103807 -rw-r--r-- 2 david users    0 Aug 25 13:24 agenda.doc
103808 -rwsr--r-- 1 root  users    0 Aug 25 14:11 meeting
103806 -rw-r--r-- 1 david users 2032 Aug 22 11:42 minutes.txt
103807 -rw-r--r-- 2 david users    0 Aug 25 13:24 old_agenda
 

For each of the following commands indicate

§         whether or not it will work,

§         if it works specify how the above directory listing will change,

§         if it doesn't work why?

 

 

chmod 777 minutes.txt

chmod u+w agenda.doc

chmod o-x meeting

chmod u+s minutes.txt

ln -s meeting new_meeting

chown root old_agenda

 

5.4   Assume that the following files exist in the current directory.

bash$ ls -li
total 1
32845 -rw-r--r--  2 jonesd   users  0 Apr   6 15:38 cq_uni_doc
32845 -rw-r--r--  2 jonesd   users  0 Apr   6 15:38 cqu_union
32847 lrwxr-xr-x  1 jonesd   users  10 Apr  6 15:38 osborne -> cq_uni_doc
 

For each of the following commands explain how the output of the command ls -li will change AFTER the command has been executed. Assume that that each command starts with the above information

For example, after the command mv cq_uni_doc CQ.DOC the only change would be that entry for the file cq_uni_doc would change to

32845 -rw-r--r--  2 jonesd   users  0 Apr   6 15:38 CQ.DOC     

chmod a-x osborne

chmod 770 cqu_union

rm cqu_union

rm cqu_uni_doc

The files cq_uni_doc and cqu_union both point to the same file using a hard link. Above I have stated that if you execute the command mv cq_uni_doc CQ.DOC the only thing that changes is the name of the file cq_uni_doc. Why doesn't the name of the file cqu_union change also?