grep is not only one of the most useful commands, but also, mastery of grep opens the gates to mastery of other tools such as awk , sed and perl .
grep basically searches. More precisely,
grep foo file returns all the lines that contain
a string matching the expression "foo" in the file "file".
For now, we will just think of an expression as a string. So grep
returns all matching lines that contain foo as a substring.
Another way of using grep is to have it accept data through
STDIN. instead of having it search a file. For example,
ls |grep blah lists all files in the current
directory containing the string "blah"
This tutorial is based on the GNU version of grep. It is recommended that you use this version. To use it, firstly, it needs to be installed on your system. Secondly, your PATH needs to be set so that GNU grep is used in preference to the standard version.
>cat file
big
bad bug
bag
bigger
boogy
>grep b.g file
big
bad bug
bag
bigger
notice that boogy didn't match, since the "." matches exactly
one character. To match arbitrary strings, we use the star, which works
in the following way:
the expression consisting of a character followed by a star matches any number (possibly zero) of repetitions of that character. In particular, .* matches any string, and hence acts as a "wildcard".To illustrate, we show some examples:
The File for These Examples
|
Wildcards #1
|
Wildcards #2
|
Wildcards #3
|
Frederic Smith
or Fred Smith. In other words, the letters eric
are "optional".
First, we introduce the concept of an "escaped" character.
An escaped character is a character preceded by a backslash. The preceding backslash does one of the following:
(a) removes an implied special meaning from a character (b) adds special meaning to a "non-special" character
hello.gif, the correct
command is
grep 'hello\.gif' file
since grep 'hello.gif' file will match lines containing
hello-gif , hello1gif , helloagif , etc.
Now we move on to grouping expressions, in order to find a way
of making an expression to match Fred or Frederic
an expression consisting of a character followed by an escaped question mark matches one or zero instances of that character.
bugg?y matches all of the following: bugy
, buggy but not bugggy We move on to "grouping" expressions.
In our example, we want to make the string "ederic" following "Fred" optional,
we don't just want one optional character.
An expression surrounded by "escaped" parentheses is treated by a single character.
Fred\(eric\)\? Smith matches Fred Smith
or Frederic Smith\(abc\)* matches abc , abcabcabc
etc. (i.e. , any number of repetitions of the string abc ,
including the empty string.) Note that we have to be careful when our expressions
contain white spaces. When this happens, we need to enclose them in quotes
so that the shell does not mis-interpret the command. So to use our example
above, we would need to type
grep "Fred\(eric\)\? Smith" fileWe now mention several other useful operators.
Ranges of characters are also permitted.[Hh]ellomatches lines containinghelloorHello
There are also some alternate forms :[0-3]is the same as[0123]
[a-k]is the same as[abcdefghijk]
[A-C]is the same as[ABC]
[A-Ca-k]is the same as
[ABCabcdefghijk]
[[:alpha:]]is the same as[a-zA-Z]
[[:upper:]]is the same as[A-Z]
[[:lower:]]is the same as[a-z]
[[:digit:]]is the same as[0-9]
[[:alnum:]]is the same as[0-9a-zA-Z]
[[:space:]]matches any white space including tabs
These alternate forms such as [[:digit:]]
are preferable to the direct method [0-9]
grep "([^()]*)a" file returns any line
containing a pair of parentheses that are innermost and are followed by
the letter "a". So it matches these lines
(hello)aBut not this
(aksjdhaksj d ka)a
x=(y+2(x+1))a
grep "[:digit:]\{3\}[ -]\?[:digit:]\{4\}"
file
This matches phone numbers, possibly containing a dash or whitespace in
the middle.
This is not what we wanted. So what went wrong ? The problem is that grep searches for lines containing the string "hello" , and all the lines specified contain this. To get around this problem, we introduce the end and beginning of line characters>cat file
hello
hello world
hhello
>grep hello file
hello
hello world
hhello
The $ character matches the end of the line. The ^ character matches the beginning of the line.
grep "^[[:space:]]*hello[[:space:]]*$"
file
does what we want (only returns one line) Another example: grep "^From.*mscharmi" /var/spool/mail/elflord
searches my inbox for headers from a particular person. This kind of regular
expression is extremely useful, and mail filters such as procmail use it
all the tims.
The expression consisting of
two expressions seperated by the or operator \| matches lines
containing either of those two expressions.
Note that you MUST enclose this inside single or double quotes.
grep "cat\|dog" file matches
lines containing the word "cat" or the word "dog" grep "I am a \(cat\|dog\)" matches
lines containing the string "I am a cat" or the string "I am a dog".
<H1>some string</H1> . This
is easy enough to do. But suppose I wanted to do the same but allow
H2 H3 H4 H5 H6 in place of H1. The expression
<H[1-6]>.*</H[1-6]> is not good enough since it matches
<H1>Hello world</H3> but we want the opening tag
to match the closing one. To do this, we use a backreference
The expression \n where n is a number, matches the contents of the n'th set of parentheses in the expressionWoah, this really needs an example!
>H\([1-6]\).*</H\1>
matches what we were trying to match before. "Mr \(dog\|cat\) came home to Mrs
\1 and they went to visit Mr \(dog\|cat\) and Mrs \2 to discuss the meaning
of life matches ... well I'm sure you can work it out. the idea
is that the cats and dogs should match up in such a way that it makes sense.
Note that a $ sign loses its meaning if characters follow it (I think) and the carat ^ loses its meaning if other characters precede it.? \ . [ ] ^ $
[]12]
matches ] , 1, or 2. grep "!" file
will often produce an error (since the shell thinks that "!" is referring
to the shell command history) while grep '!' file will not.
When should you use single quotes ? the answer is this: if you want to use shell variables, you need double quotes. For example,
grep "$HOME" file
searches file for the name of your home directory, while
grep '$HOME' file
searches for the string $HOME
We now discuss egrep syntax as
opposed to grep syntax. Ironically, despite the origin of the name (extended),
egrep actually has less functionality as it is designed for compatibility
with the traditional egrep. A better way to do an extended "grep" is to use
grep -E which uses
extended regular expression syntax without loss of functionality.
| grep | grep -E | Available for egrep? |
a\+ |
a+ |
yes |
a\? |
a? |
yes |
expression1\|expression2 |
expression1|expression2? |
yes |
\(expression\) |
(expression1) |
yes |
\{m,n\} |
{m,n} |
no |
\{,n\} |
{,n} |
no |
\{m,} |
{m,} |
no |
\{m} |
{m} |
no |