Linux + C – Regular Expressions (REGEX)

via Fort Collins Program

Many valuable programs in Linux rely on the concept of the Regular Expression (REGEX). It’s important for us to understand the basics of this “language” so that we can better use the tools we’re provided.

REGEX

A “regular expression” is a standardized format for generating text using character literals (regular characters) and special characters. The key feature of this format is the ability to apply pattern matching to create complex requests and strings.

The first thing to understand is the character literal. If you create a string out of simple alphanumeric characters, the language won’t mess with it. The regular expression “Mary” refers to that exact sequence of letters – M a r y – with no substitutions.

Next we should look at the wildcards. We have already discussed these, but to recap:

* can be replaced by any characters (including no characters – panda* could refer to “panda”, “pandamonium”, or “panda_party”)

+ can be replaced by any characters (so long as at least one character fits the format – Mary+ cannot refer to “Mary”)

? can be replaced by exactly one character

. can be replaced by either zero or one character

[abc-gq-t] can be replaced by the characters a, b, c, d, e, f, g, q, r, s, or t

[^a-s] can be replaced by any characters NOT in the range a-s

We also have positional characters. The character ^ refers to the start of the line, while $ refers to the end of the line. Thus, we can look for the word panda at the start and end of the line (respectively) using the following:

^panda

panda$

Significantly, we can use the boolean OR operator ( | ) to choose between sets of options. We can choose to find panda at the beginning OR end of the line with the following:

^panda | panda$

Advanced expressions

We can use curly braces {} to specify a number of successive occurrences. For example, we can select a number of instances of the equals sign using the following:

={2} – only instances of ==

={,15} – any number of instances UP TO 15

={25,75} – any number of instances between 25 and 75

We can use * and + to specify a number of specific occurrences. * refers to ZERO OR MORE instances of the expression, while + refers to ONE OR MORE instances.

=* – any number of equals signs

=+ – at least one equals sign

Finally, we can use () to create “atoms” of regular expressions. These “atoms” are treated as a set. So, if we wanted to match a series of characters (such as “char”), we could write:

(char)+

Again, I point you to gnosis.cx for their excellent tutorial on REGEX.

Facebook Auto Publish Powered By : XYZScripts.com