Matching against regexes
Regexes describe patterns of text. They provide us with a language, in which we can express the structure of the text.
Consider an example. A phone number is a sequence of digits. The phrase "sequence of digits" can be written down as \d+
. If we take into account the fact that phone numbers may be written with spaces and dashes, then we have to say that a phone number is a sequence of digits, delimited with spaces or dashes. This is already a more complex regex, which can be written differently, depending on how strict we are, for instance, if we allow two spaces together or if a dash can be followed by a space, or if a group of digits can consist of a single digit.
Let's be least strict and formalize it as (\d || \s || \-)+
, that is more than one number of digits (\d
) or spaces (\s
) or dashes (\-
). The double vertical bar stands for "or" here, and the +
means more than one. Finally, an international phone number can be prefixed with a plus character, which is optional...