Learn everything you need to know about JavaScript Regular Expressions in this comprehensive guide. Discover the most important concepts and explore practical examples to master this powerful tool.
Introduction to Regular Expressions
A regular expression, also known as a regex, is a syntax used to work with strings in a highly efficient manner. Regular expressions allow you to search text, replace substrings, and extract information from strings. They are supported by almost every programming language, including JavaScript.
Regular expressions have been around since the 1950s and were initially used as a conceptual search pattern for string processing algorithms. Over time, they grew in popularity and were implemented in various tools and programming languages.
Hard but Useful
Regular expressions can be challenging to understand and maintain, especially for beginners. However, they are an invaluable tool for certain string manipulations that are otherwise difficult to achieve using simple string methods.
While simple regular expressions are easy to read and write, complex regular expressions can quickly become confusing. It is essential to grasp the basic concepts before diving into more advanced patterns and features.
How Does a Regular Expression Look Like
In JavaScript, a regular expression is represented by an object. There are two ways to define a regular expression:
-
Using the
RegExp
object constructor:const re1 = new RegExp('hey');
-
Using the regular expression literal form:
const re1 = /hey/;
The pattern, such as “hey” in the example above, is the main component of a regular expression. In the literal form, it is delimited by forward slashes, while in the object constructor form, it is not. This is one of the key differences between these two forms.
How Does It Work
Let’s start with a simple regular expression that searches for the word “hey” in a string:
const re1 = /hey/;
You can test this regular expression using the test
method, which returns a boolean value indicating whether the pattern is found in the given string:
re1.test('hey'); // true
re1.test('blablabla hey blablabla'); // true
re1.test('he'); // false
re1.test('blablabla'); // false
The regular expression /hey/
matches the string “hey” wherever it appears. It will return true even if “hey” is part of a larger string.
Anchoring
To match strings that start or end with a specific pattern, you can use anchoring operators. The ^
operator matches the start of a string, while the $
operator matches the end of a string.
For example, to match strings that start with “hey”, use the pattern /^hey/
:
/^hey/.test('hey'); // true
/^hey/.test('bla hey'); // false
To match strings that end with “hey”, use the pattern /hey$/
:
/hey$/.test('hey'); // true
/hey$/.test('bla hey'); // true
/hey$/.test('hey you'); // false
By combining the ^
and $
operators, you can match strings that exactly match a specific pattern:
/^hey$/.test('hey'); // true
Match Items in Ranges
Instead of matching a specific string, you can match any character within a range. The following regex patterns match characters within specific ranges:
[a-z]
: matches lowercase letters from “a” to “z”[A-Z]
: matches uppercase letters from “A” to “Z”[a-c]
: matches lowercase letters “a”, “b”, or “c”[0-9]
: matches any digit from 0 to 9
For example, the regex pattern /[a-z]/
matches strings that contain at least one lowercase letter:
/[a-z]/.test('a'); // true
/[a-z]/.test('1'); // false
/[a-z]/.test('A'); // false
Ranges can also be combined. For example, the regex pattern /[A-Za-z0-9]/
matches alphanumeric characters:
/[A-Za-z0-9]/.test('a'); // true
/[A-Za-z0-9]/.test('1'); // true
/[A-Za-z0-9]/.test('A'); // true
Matching a Range Item Multiple Times
To match one or more occurrences of a character or pattern, you can use quantifiers. The following quantifiers can be used with regular expressions:
+
: matches one or more occurrences*
: matches zero or more occurrences{n}
: matches exactlyn
occurrences{n,m}
: matches betweenn
andm
occurrences
For example, the regex pattern /^\d+$/
matches strings that have one or more digits:
/^\d+$/.test('12'); // true
/^\d+$/.test('144343'); // true
/^\d+$/.test(''); // false
/^\d+$/.test('1a'); // false
The *
quantifier allows zero or more occurrences:
/^\d*$/.test('12'); // true
/^\d*$/.test('144343'); // true
/^\d*$/.test(''); // true
/^\d*$/.test('1a'); // false
The {n}
quantifier matches exactly n
occurrences:
/^\d{3}$/.test('123'); // true
/^\d{3}$/.test('12'); // false
/^\d{3}$/.test('1234'); // false
/^[A-Za-z0-9]{3}$/.test('Abc'); // true
The {n,m}
quantifier matches between n
and m
occurrences:
/^\d{3,5}$/.test('123'); // true
/^\d{3,5}$/.test('1234'); // true
/^\d{3,5}$/.test('12345'); // true
/^\d{3,5}$/.test('123456'); // false
If you want to match at least n
occurrences without an upper limit, you can omit m
:
/^\d{3,}$/.test('12'); // false
/^\d{3,}$/.test('123'); // true
/^\d{3,}$/.test('12345'); // true
/^\d{3,}$/.test('123456789'); // true
Optional Items
To make a character or pattern optional, you can use the ?
quantifier. It matches zero or one occurrence.
For example, the regex pattern /^\d{3}\w?$/
matches strings that have exactly three digits followed by an optional alphanumeric character:
/^\d{3}\w?$/.test('123'); // true
/^\d{3}\w?$/.test('123a'); // true
/^\d{3}\w?$/.test('123ab'); // false
Groups
Groups in regular expressions are enclosed in parentheses (...)
. They allow you to define subpatterns within the overall regex pattern.
For example, the regex pattern /^(\d{3})(\w+)$/
matches strings that start with exactly three digits followed by one or more alphanumeric characters. The groups (\d{3})
and (\w+)
capture the matched digits and alphanumeric characters separately.
You can access the captured groups using the match
or exec
methods. The first item in the returned array is the entire matched string, followed by each captured group’s content.
'123s'.match(/^(\d{3})(\w+)$/);
// Array [ "123s", "123", "s" ]
Capturing Groups
Capturing groups are useful for extracting specific parts of a string. They allow you to capture and store substrings for later use.
To capture a group, enclose the desired pattern within parentheses. The captured group content can then be accessed using the match
or exec
methods.
For example, consider the regex pattern /(\S+)@(\S+)\.(\S+)/
. This pattern captures the username, domain, and top-level domain of an email address. By using exec
or match
, you can extract each part separately.
/(\S+)@(\S+)\.(\S+)/.exec('example[email protected]');
// Array [ "example[email protected]", "example", "domain", "com" ]
Noncapturing Groups
By default, groups in regular expressions are capturing groups. However, there may be cases where you want to perform a match without capturing the result.
Noncapturing groups can be created using (?:...)
. These groups allow you to specify a subpattern without storing the match in the resulting array.
For example, /^(\d{3})(?:\s)(\w+)$/
matches strings that start with three digits, followed by a space (noncapturing group), and then one or more alphanumeric characters.
/^(\d{3})(?:\s)(\w+)$/.exec('123 s');
// Array [ "123 s", "123", "s" ]
Flags
Regular expressions in JavaScript can include flags that modify their behavior. Flags are used after the trailing slash in regex literals or as the second parameter in the RegExp
constructor.
The following flags can be used with regular expressions:
g
: matches the pattern multiple timesi
: performs a case-insensitive matchm
: enables multiline modeu
: enables support for Unicodes
: enables dotall mode
Flags can be combined, allowing you to customize regex matching according to your needs.
Inspecting a Regex
To inspect the properties of a regular expression, you can access its various properties:
source
: the pattern stringmultiline
: true if them
flag is setglobal
: true if theg
flag is setignoreCase
: true if thei
flag is setlastIndex
: the index at which to start the next search
For example:
/(\w{3})/i.source; // "(\w{3})"
/(\w{3})/i.multiline; // false
/(\w{3})/i.lastIndex; // 0
/(\w{3})/i.ignoreCase; // true
/(\w{3})/i.global; // false
Escaping
Certain characters have special meanings in regular expressions and need to be escaped using a backslash (\
).
The following characters require escaping:
\
,/
,[
,]
,(
,)
,{
,}
,?
,+
,*
,|
,.
,^
,$
For example:
/^\$$/.test('$'); // true
/^\^$/.test('^'); // true
/^\\$/; // regex pattern to match a single backslash
String Boundaries
For more precise pattern matching, you can use \b
and \B
to match the boundaries of words within a string.
\b
matches a set of characters at the beginning or end of a word\B
matches a set of characters that are not at the beginning or end of a word
For example:
/\bbear/.test('I saw a bear'); // true
/\bbear/.test('I saw a beard'); // true
/\bbear\b/.test('I saw a beard'); // false
Replacing Using Regular Expressions
JavaScript provides the replace
method for replacing parts of a string based on a regular expression pattern. You can use this method to perform single or multiple replacements.
The replace
method can accept a string or a regular expression as its first argument. When using a regular expression, the g
flag is needed to replace multiple occurrences.
For example:
"Hello world!".replace('world', 'dog'); // "Hello dog!"
"My dog is a good dog!".replace(/dog/g, 'cat'); // "My cat is a good cat!"
You can also use capturing groups and functions for more advanced replacements. By using capturing groups, you can refer to the matched groups in the replacement string. By passing a function as the second argument, you have more flexibility in manipulating the matched substrings.
For example:
"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!'); // "world: Hello!!!"
Greediness
By default, regular expressions are greedy, which means they match as much as possible. This behavior can sometimes lead to unexpected results.
For example, consider the regex pattern /\$(.+)\s?/
used to extract a dollar amount from a string. Without a lazy modifier, it matches until the last space:
/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1];
// "100 and it is less than $200"
To make the regex lazy and match the minimum number of characters possible, you can add a ?
after the quantifier:
/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1];
// "100"
Lookaheads: Match a String Depending on What Follows It
Lookaheads allow you to match a string based on the presence or absence of specific characters that follow the matched string. Lookaheads have the following syntax:
?=
: positive lookahead?!
: negative lookahead
For example, /(?=Roger) Waters/
matches “Waters” only if it is followed by “Roger”:
/(?= Roger) Waters/.test('Roger is my dog'); // false
/(?= Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // true
The negative lookahead ?!
performs the inverse operation, matching a string only if it is not followed by a specific substring:
/(?<!Roger) Waters/.test('Roger is my dog'); // true
/(?<!Roger) Waters/.test('Roger Waters is a famous musician'); // false
Lookbehinds: Match a String Depending on What Precedes It
Lookbehinds, introduced in ES2018, are similar to lookaheads but match a string based on its preceding characters. Lookbehinds use ?<=
for positive lookbehinds and ?<!
for negative lookbehinds.
For example, /(?<=Roger) Waters/
matches “Waters” only if it is preceded by “Roger”:
/(?<=Roger) Waters/.test('Pink Waters is my dog'); // false
/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // true
The negative lookbehind ?<!
matches a string if it is not preceded by a specific substring:
/(?<!Roger) Waters/.test('Pink Waters is my dog'); // true
/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // false
Regular Expressions and Unicode
When working with Unicode strings in JavaScript, it is important to use the u
flag in regular expressions, especially when handling characters in the astral planes. If the u
flag is not used, regular expressions may not match Unicode characters correctly.
For example, the regex /^.$/
matches any character except newline, but it fails to match the emoji “🐶” because JavaScript represents it internally as two characters. With the u
flag, the regex matches the full emoji:
/^.$/.test('a'); // true
/^.$/.test('🐶'); // false
/^.$/u.test('🐶'); // true
Always include the u
flag when working with Unicode characters to ensure accurate matching.
Additionally, you can use Unicode property escapes, a feature introduced in ES2018, to match characters based on their Unicode properties. Property escapes are enclosed in \p{}
or \P{}
, where the property name is specified within the curly braces.
For example, \p{ASCII}
matches any ASCII character, while \p{Emoji}
matches any emoji character:
/^\p{ASCII}+$/u.test('abc'); // true
/^\p{ASCII}+$/u.test('ABC@'); // true
/^\p{ASCII}+$/u.test('ABC🙃'); // false
/^\p{Emoji}+$/u.test('H'); // false
/^\p{Emoji}+$/u.test('🙃🙃'); // true
Unicode property escapes provide a powerful tool for matching specific Unicode characters or character sets based on their properties.
Examples
Here are a few practical examples to demonstrate the power of regular expressions:
Extract a Number from a String
To extract a number from a string, you can use the \d+
pattern, which matches one or more digits:
'Test 123123329'.match(/\d+/);
// Array [ "123123329" ]
Match an Email Address
Matching valid email addresses is a complex task, but a simple approach can be taken using regular expressions. For example, using the pattern (\S+)@(\S+)\.(\S+)
, you can capture the username, domain, and top-level domain of an email address:
/(\S+)@(\S+)\.(\S+)/.exec('example[email protected]');
// Array [ "example[email protected]", "example", "domain", "com" ]
Capture Text Between Double Quotes
To capture text between double quotes, you can use the pattern /"([^']+)"/
. This pattern captures the text enclosed in double quotes while excluding the quotes themselves:
const hello = 'Hello "nice flower"';
const result = /"([^']+)"/.exec(hello);
// Array [ "\"nice flower\"", "nice flower" ]
Get the Content Inside an HTML Tag
To extract the content within a specific HTML tag, you can create a regex pattern to match the desired tag and its contents. For example, to capture the content within a <span>
tag:
/<span\b[^>]*>(.*?)<\/span>/;
Using regular expressions in JavaScript opens up a world of possibilities for manipulating and extracting information from strings. Spend time practicing and experimenting with different patterns to become proficient in using regular expressions effectively.