Using patterns to specify files in the shell is all well and good - it'll save you mountains of keystrokes - but it's also a bit limited. Often, rather than the names of files on the filesystem, you need to work with text produced by other commands, or with the contents of large text files.
It's also often desirable to write patterns that can match more complicated text.
This is where tools like grep
come in. Its task is to look at some text (in a file or in the standard output from another command) and print the lines that match a pattern called a regular expression.
Regular expressions (also frequently written regex or regexp) are kind of like wildcards on steroids. Instead of a handful of general-purpose magic characters, they offer a rich vocabulary for expressing what characters and how many of them a pattern should match.
Regexen are actually a deep, complicated topic, and they come in dozens of different flavors. It can take years to master their use, and they frequently confuse even very experienced programmers. Despite all that, it's easy enough to learn the basics, and the basics are enough to get quite a bit done.
As a basic example, let's find some dictionary words. grep
takes a pattern and (optionally) a file to search.
pi@raspberrypi ~ $ grep '^magi.*$' /usr/share/dict/words magic magic's magical magically magician magician's magicians magisterial magisterially magistrate magistrate's magistrates
The pattern here, ^magi.*$
, is passed to grep
inside of single quotes so that the shell won't expand characters like *
. It demonstrates a bunch of the basics:
^ |
start of line |
magi |
the literal characters "magi" |
. |
any character |
* |
0 or more of the preceding token, |
$ |
end of line |
Notice how *
works subtly different in a regular expression from a shell pattern. Rather than standing in for any character on its own, it imposes a quantity on the previous token.
The ^
and $
serve to anchor the pattern to the beginning and end of the line, respectively. These aren't always necessary, but it can be very useful to say that a string occurs at one end or the other.
By default, the grep
command doesn't treat most characters as magical. To get the full range of its abilities, you'll want to invoke it as egrep
or grep -E
for Extended grep. With that in place, here are more of the basics:
[123] |
one of 1, 2, or 3 |
[a-z] |
one of the characters a-z |
\w |
a "word" character |
[0-9] |
one of 0 through 9 |
+ |
one or more of the previous thing |
? |
zero or one of the previous thing |
(foo){1,3} |
one to three occurrences of |
(foo|bar) |
|
(foo|bar|baz)* |
zero or more occurrences of |
In addition to -E
, grep
takes a bunch of other options. It's worth reading the man page, but the handful that seem to come up most often in shell pipelines are as follows:
|
look for |
|
invert the search - find lines that don't contain |
|
print a count of lines matching |
|
list the text files that match |
- Mastering Regular Expressions, by Jeffrey Friedl, is one of the more comprehensive available texts.
- The GNU Grep manual
Page last edited February 20, 2015
Text editor powered by tinymce.