Using patterns to specify files in the shell is all well and good - it'll save you mountains of keystrokes - but it's also a bit limited. Often, rather than the names of files on the filesystem, you need to work with text produced by other commands, or with the contents of large text files.

It's also often desirable to write patterns that can match more complicated text.

This is where tools like grep come in. Its task is to look at some text (in a file or in the standard output from another command) and print the lines that match a pattern called a regular expression.

Regular expressions (also frequently written regex or regexp) are kind of like wildcards on steroids. Instead of a handful of general-purpose magic characters, they offer a rich vocabulary for expressing what characters and how many of them a pattern should match.

Regexen are actually a deep, complicated topic, and they come in dozens of different flavors. It can take years to master their use, and they frequently confuse even very experienced programmers. Despite all that, it's easy enough to learn the basics, and the basics are enough to get quite a bit done.

As a basic example, let's find some dictionary words. grep takes a pattern and (optionally) a file to search.

[email protected] ~ $ grep '^magi.*$' /usr/share/dict/words
magic
magic's
magical
magically
magician
magician's
magicians
magisterial
magisterially
magistrate
magistrate's
magistrates

The pattern here, ^magi.*$, is passed to grep inside of single quotes so that the shell won't expand characters like *. It demonstrates a bunch of the basics:

^

start of line

magi

the literal characters "magi"

.

any character

*

0 or more of the preceding token, . - what's known as a quantifier

$

end of line

Notice how * works subtly different in a regular expression from a shell pattern. Rather than standing in for any character on its own, it imposes a quantity on the previous token.

The ^ and $ serve to anchor the pattern to the beginning and end of the line, respectively. These aren't always necessary, but it can be very useful to say that a string occurs at one end or the other.

By default, the grep command doesn't treat most characters as magical. To get the full range of its abilities, you'll want to invoke it as egrep or grep -E for Extended grep. With that in place, here are more of the basics:

[123]

one of 1, 2, or 3

[a-z]

one of the characters a-z

\w

a "word" character

[0-9]

one of 0 through 9

+

one or more of the previous thing

?

zero or one of the previous thing

(foo){1,3}

one to three occurrences of foo

(foo|bar)

foo or bar

(foo|bar|baz)*

zero or more occurrences of foo, bar, or baz

In addition to -E, grep takes a bunch of other options. It's worth reading the man page, but the handful that seem to come up most often in shell pipelines are as follows:

grep -i foo

look for foo without paying attention to case - will find FOO, foo, fOO, etc.

grep -v foo

invert the search - find lines that don't contain foo

grep -c foo

print a count of lines matching foo

grep -l foo *.txt

list the text files that match foo

Further Reading

This guide was first published on Feb 24, 2015. It was last updated on Feb 24, 2015.

This page (Find Text with grep(1)) was last updated on Feb 20, 2015.

Text editor powered by tinymce.