Literal Characters
The most basic regular expression consists of a single literal character, such as a. It matches the first occurrence of that character in the string.
Special Characters
Because we want to do more than simply search for literal pieces of text, we need to reserve certain characters for special use.
- 12 characters with special meanings:
- the backslash
\
, - the caret
^
, - the dollar sign
$
, - the period or dot
.
, - the vertical bar or pipe symbol
|
, - the question mark
?
, - the asterisk or star
*
, - the plus sign
+
, - the opening parenthesis
(
, - the closing parenthesis
)
, - the opening square bracket
[
, - the opening curly brace
{
,
- the backslash
These special characters are often called "metacharacters".
'
and"
are not special characters.- Most regular expression flavors treat the brace
{
as a literal character, unless it is part of a repetition operator likea{1,3}
. So you generally do not need to escape it with a backslash, though you can do so if you want. An exception to this rule is the Java, which requires all literal braces to be escaped.- All other characters should not be escaped with a backslash. That is because the backslash is also a special character. The backslash in combination with a literal character can create a regex token with a special meaning. E.g.
\d
is a shorthand that matches a single digit from 0 to 9.- In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string. So the regex
1\+1=2
must be written as"1\\+1=2"
in C++ code.