Learn everything you need to know about JavaScript Regular Expressions in this comprehensive guide. Discover the most important concepts and explore practical examples to master this powerful tool.

Introduction to Regular Expressions

A regular expression, also known as a regex, is a syntax used to work with strings in a highly efficient manner. Regular expressions allow you to search text, replace substrings, and extract information from strings. They are supported by almost every programming language, including JavaScript.

Regular expressions have been around since the 1950s and were initially used as a conceptual search pattern for string processing algorithms. Over time, they grew in popularity and were implemented in various tools and programming languages.

Hard but Useful

Regular expressions can be challenging to understand and maintain, especially for beginners. However, they are an invaluable tool for certain string manipulations that are otherwise difficult to achieve using simple string methods.

While simple regular expressions are easy to read and write, complex regular expressions can quickly become confusing. It is essential to grasp the basic concepts before diving into more advanced patterns and features.

How Does a Regular Expression Look Like

In JavaScript, a regular expression is represented by an object. There are two ways to define a regular expression:

  1. Using the RegExp object constructor:

    const re1 = new RegExp('hey');
    
  2. Using the regular expression literal form:

    const re1 = /hey/;
    

The pattern, such as “hey” in the example above, is the main component of a regular expression. In the literal form, it is delimited by forward slashes, while in the object constructor form, it is not. This is one of the key differences between these two forms.

How Does It Work

Let’s start with a simple regular expression that searches for the word “hey” in a string:

const re1 = /hey/;

You can test this regular expression using the test method, which returns a boolean value indicating whether the pattern is found in the given string:

re1.test('hey'); // true
re1.test('blablabla hey blablabla'); // true
re1.test('he'); // false
re1.test('blablabla'); // false

The regular expression /hey/ matches the string “hey” wherever it appears. It will return true even if “hey” is part of a larger string.

Anchoring

To match strings that start or end with a specific pattern, you can use anchoring operators. The ^ operator matches the start of a string, while the $ operator matches the end of a string.

For example, to match strings that start with “hey”, use the pattern /^hey/:

/^hey/.test('hey'); // true
/^hey/.test('bla hey'); // false

To match strings that end with “hey”, use the pattern /hey$/:

/hey$/.test('hey'); // true
/hey$/.test('bla hey'); // true
/hey$/.test('hey you'); // false

By combining the ^ and $ operators, you can match strings that exactly match a specific pattern:

/^hey$/.test('hey'); // true

Match Items in Ranges

Instead of matching a specific string, you can match any character within a range. The following regex patterns match characters within specific ranges:

  • [a-z]: matches lowercase letters from “a” to “z”
  • [A-Z]: matches uppercase letters from “A” to “Z”
  • [a-c]: matches lowercase letters “a”, “b”, or “c”
  • [0-9]: matches any digit from 0 to 9

For example, the regex pattern /[a-z]/ matches strings that contain at least one lowercase letter:

/[a-z]/.test('a'); // true
/[a-z]/.test('1'); // false
/[a-z]/.test('A'); // false

Ranges can also be combined. For example, the regex pattern /[A-Za-z0-9]/ matches alphanumeric characters:

/[A-Za-z0-9]/.test('a'); // true
/[A-Za-z0-9]/.test('1'); // true
/[A-Za-z0-9]/.test('A'); // true

Matching a Range Item Multiple Times

To match one or more occurrences of a character or pattern, you can use quantifiers. The following quantifiers can be used with regular expressions:

  • +: matches one or more occurrences
  • *: matches zero or more occurrences
  • {n}: matches exactly n occurrences
  • {n,m}: matches between n and m occurrences

For example, the regex pattern /^\d+$/ matches strings that have one or more digits:

/^\d+$/.test('12'); // true
/^\d+$/.test('144343'); // true
/^\d+$/.test(''); // false
/^\d+$/.test('1a'); // false

The * quantifier allows zero or more occurrences:

/^\d*$/.test('12'); // true
/^\d*$/.test('144343'); // true
/^\d*$/.test(''); // true
/^\d*$/.test('1a'); // false

The {n} quantifier matches exactly n occurrences:

/^\d{3}$/.test('123'); // true
/^\d{3}$/.test('12'); // false
/^\d{3}$/.test('1234'); // false
/^[A-Za-z0-9]{3}$/.test('Abc'); // true

The {n,m} quantifier matches between n and m occurrences:

/^\d{3,5}$/.test('123'); // true
/^\d{3,5}$/.test('1234'); // true
/^\d{3,5}$/.test('12345'); // true
/^\d{3,5}$/.test('123456'); // false

If you want to match at least n occurrences without an upper limit, you can omit m:

/^\d{3,}$/.test('12'); // false
/^\d{3,}$/.test('123'); // true
/^\d{3,}$/.test('12345'); // true
/^\d{3,}$/.test('123456789'); // true

Optional Items

To make a character or pattern optional, you can use the ? quantifier. It matches zero or one occurrence.

For example, the regex pattern /^\d{3}\w?$/ matches strings that have exactly three digits followed by an optional alphanumeric character:

/^\d{3}\w?$/.test('123'); // true
/^\d{3}\w?$/.test('123a'); // true
/^\d{3}\w?$/.test('123ab'); // false

Groups

Groups in regular expressions are enclosed in parentheses (...). They allow you to define subpatterns within the overall regex pattern.

For example, the regex pattern /^(\d{3})(\w+)$/ matches strings that start with exactly three digits followed by one or more alphanumeric characters. The groups (\d{3}) and (\w+) capture the matched digits and alphanumeric characters separately.

You can access the captured groups using the match or exec methods. The first item in the returned array is the entire matched string, followed by each captured group’s content.

'123s'.match(/^(\d{3})(\w+)$/);
// Array [ "123s", "123", "s" ]

Capturing Groups

Capturing groups are useful for extracting specific parts of a string. They allow you to capture and store substrings for later use.

To capture a group, enclose the desired pattern within parentheses. The captured group content can then be accessed using the match or exec methods.

For example, consider the regex pattern /(\S+)@(\S+)\.(\S+)/. This pattern captures the username, domain, and top-level domain of an email address. By using exec or match, you can extract each part separately.

/(\S+)@(\S+)\.(\S+)/.exec('example[email protected]');
// Array [ "example[email protected]", "example", "domain", "com" ]

Noncapturing Groups

By default, groups in regular expressions are capturing groups. However, there may be cases where you want to perform a match without capturing the result.

Noncapturing groups can be created using (?:...). These groups allow you to specify a subpattern without storing the match in the resulting array.

For example, /^(\d{3})(?:\s)(\w+)$/ matches strings that start with three digits, followed by a space (noncapturing group), and then one or more alphanumeric characters.

/^(\d{3})(?:\s)(\w+)$/.exec('123 s');
// Array [ "123 s", "123", "s" ]

Flags

Regular expressions in JavaScript can include flags that modify their behavior. Flags are used after the trailing slash in regex literals or as the second parameter in the RegExp constructor.

The following flags can be used with regular expressions:

  • g: matches the pattern multiple times
  • i: performs a case-insensitive match
  • m: enables multiline mode
  • u: enables support for Unicode
  • s: enables dotall mode

Flags can be combined, allowing you to customize regex matching according to your needs.

Inspecting a Regex

To inspect the properties of a regular expression, you can access its various properties:

  • source: the pattern string
  • multiline: true if the m flag is set
  • global: true if the g flag is set
  • ignoreCase: true if the i flag is set
  • lastIndex: the index at which to start the next search

For example:

/(\w{3})/i.source; // "(\w{3})"
/(\w{3})/i.multiline; // false
/(\w{3})/i.lastIndex; // 0
/(\w{3})/i.ignoreCase; // true
/(\w{3})/i.global; // false

Escaping

Certain characters have special meanings in regular expressions and need to be escaped using a backslash (\).

The following characters require escaping:

  • \, /, [, ], (, ), {, }, ?, +, *, |, ., ^, $

For example:

/^\$$/.test('$'); // true
/^\^$/.test('^'); // true
/^\\$/; // regex pattern to match a single backslash

String Boundaries

For more precise pattern matching, you can use \b and \B to match the boundaries of words within a string.

  • \b matches a set of characters at the beginning or end of a word
  • \B matches a set of characters that are not at the beginning or end of a word

For example:

/\bbear/.test('I saw a bear'); // true
/\bbear/.test('I saw a beard'); // true
/\bbear\b/.test('I saw a beard'); // false

Replacing Using Regular Expressions

JavaScript provides the replace method for replacing parts of a string based on a regular expression pattern. You can use this method to perform single or multiple replacements.

The replace method can accept a string or a regular expression as its first argument. When using a regular expression, the g flag is needed to replace multiple occurrences.

For example:

"Hello world!".replace('world', 'dog'); // "Hello dog!"
"My dog is a good dog!".replace(/dog/g, 'cat'); // "My cat is a good cat!"

You can also use capturing groups and functions for more advanced replacements. By using capturing groups, you can refer to the matched groups in the replacement string. By passing a function as the second argument, you have more flexibility in manipulating the matched substrings.

For example:

"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!'); // "world: Hello!!!"

Greediness

By default, regular expressions are greedy, which means they match as much as possible. This behavior can sometimes lead to unexpected results.

For example, consider the regex pattern /\$(.+)\s?/ used to extract a dollar amount from a string. Without a lazy modifier, it matches until the last space:

/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1];
// "100 and it is less than $200"

To make the regex lazy and match the minimum number of characters possible, you can add a ? after the quantifier:

/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1];
// "100"

Lookaheads: Match a String Depending on What Follows It

Lookaheads allow you to match a string based on the presence or absence of specific characters that follow the matched string. Lookaheads have the following syntax:

  • ?=: positive lookahead
  • ?!: negative lookahead

For example, /(?=Roger) Waters/ matches “Waters” only if it is followed by “Roger”:

/(?= Roger) Waters/.test('Roger is my dog'); // false
/(?= Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // true

The negative lookahead ?! performs the inverse operation, matching a string only if it is not followed by a specific substring:

/(?<!Roger) Waters/.test('Roger is my dog'); // true
/(?<!Roger) Waters/.test('Roger Waters is a famous musician'); // false

Lookbehinds: Match a String Depending on What Precedes It

Lookbehinds, introduced in ES2018, are similar to lookaheads but match a string based on its preceding characters. Lookbehinds use ?<= for positive lookbehinds and ?<! for negative lookbehinds.

For example, /(?<=Roger) Waters/ matches “Waters” only if it is preceded by “Roger”:

/(?<=Roger) Waters/.test('Pink Waters is my dog'); // false
/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // true

The negative lookbehind ?<! matches a string if it is not preceded by a specific substring:

/(?<!Roger) Waters/.test('Pink Waters is my dog'); // true
/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician'); // false

Regular Expressions and Unicode

When working with Unicode strings in JavaScript, it is important to use the u flag in regular expressions, especially when handling characters in the astral planes. If the u flag is not used, regular expressions may not match Unicode characters correctly.

For example, the regex /^.$/ matches any character except newline, but it fails to match the emoji “🐶” because JavaScript represents it internally as two characters. With the u flag, the regex matches the full emoji:

/^.$/.test('a'); // true
/^.$/.test('🐶'); // false
/^.$/u.test('🐶'); // true

Always include the u flag when working with Unicode characters to ensure accurate matching.

Additionally, you can use Unicode property escapes, a feature introduced in ES2018, to match characters based on their Unicode properties. Property escapes are enclosed in \p{} or \P{}, where the property name is specified within the curly braces.

For example, \p{ASCII} matches any ASCII character, while \p{Emoji} matches any emoji character:

/^\p{ASCII}+$/u.test('abc'); // true
/^\p{ASCII}+$/u.test('ABC@'); // true
/^\p{ASCII}+$/u.test('ABC🙃'); // false

/^\p{Emoji}+$/u.test('H'); // false
/^\p{Emoji}+$/u.test('🙃🙃'); // true

Unicode property escapes provide a powerful tool for matching specific Unicode characters or character sets based on their properties.

Examples

Here are a few practical examples to demonstrate the power of regular expressions:

Extract a Number from a String

To extract a number from a string, you can use the \d+ pattern, which matches one or more digits:

'Test 123123329'.match(/\d+/);
// Array [ "123123329" ]

Match an Email Address

Matching valid email addresses is a complex task, but a simple approach can be taken using regular expressions. For example, using the pattern (\S+)@(\S+)\.(\S+), you can capture the username, domain, and top-level domain of an email address:

/(\S+)@(\S+)\.(\S+)/.exec('example[email protected]');
// Array [ "example[email protected]", "example", "domain", "com" ]

Capture Text Between Double Quotes

To capture text between double quotes, you can use the pattern /"([^']+)"/. This pattern captures the text enclosed in double quotes while excluding the quotes themselves:

const hello = 'Hello "nice flower"';
const result = /"([^']+)"/.exec(hello);
// Array [ "\"nice flower\"", "nice flower" ]

Get the Content Inside an HTML Tag

To extract the content within a specific HTML tag, you can create a regex pattern to match the desired tag and its contents. For example, to capture the content within a <span> tag:

/<span\b[^>]*>(.*?)<\/span>/;

Using regular expressions in JavaScript opens up a world of possibilities for manipulating and extracting information from strings. Spend time practicing and experimenting with different patterns to become proficient in using regular expressions effectively.