Linux FAQ's & Manuals

4.1. regular expressions

4.1.1. what are regular expressions?

a regular expression is a pattern that describes a set of strings. regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.

the fundamental building blocks are the regular expressions that match a single character. most characters, including all letters and digits, are regular expressions that match themselves. any metacharacter with special meaning may be quoted by preceding it with a backslash.

4.1.2. regular expression metacharacters

a regular expression may be followed by one of several repetition operators (metacharacters):

table 4-1. regular expression operators

operatoreffect
.matches any single character.
?the preceding item is optional and will be matched, at most, once.
*the preceding item will be matched zero or more times.
+the preceding item will be matched one or more times.
{n}the preceding item is matched exactly n times.
{n,}the preceding item is matched n or more times.
{n,m}the preceding item is matched at least n times, but not more than m times.
-represents the range if it's not first or last in a list or the ending point of a range in a list.
^matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$matches the empty string at the end of a line.
\bmatches the empty string at the edge of a word.
\bmatches the empty string provided it's not at the edge of a word.
\<match the empty string at the beginning of word.
\>match the empty string at the end of word.

two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.

two regular expressions may be joined by the infix operator "|"; the resulting regular expression matches any string matching either subexpression.

repetition takes precedence over concatenation, which in turn takes precedence over alternation. a whole subexpression may be enclosed in parentheses to override these precedence rules.

4.1.3. basic versus extended regular expressions

in basic regular expressions the metacharacters "?", "+", "{", "|", "(", and ")" lose their special meaning; instead use the backslashed versions "\?", "\+", "\{", "\|", "\(", and "\)".

check in your system documentation whether commands using regular expressions support extended expressions.