Basics of Regular Expressions
A regular expression is a pattern used to match character combinations in strings. In JavaScript, regular expressions are also objects. Here is how to read and use them.
A regular expression is a pattern used to match character combinations in strings. According to the MDN Web Docs, in JavaScript, regular expressions are also objects. They are used with methods on both RegExp and String to search, test, and replace text.
Creating a Regular Expression
There are two syntaxes. The literal syntax uses forward slashes:
const regex = /hello/;
The constructor syntax uses new RegExp:
const regex = new RegExp('hello');
The literal syntax is used when the pattern is known at the time of writing. The constructor syntax is used when the pattern needs to be built dynamically at runtime.
Flags
Flags modify how a pattern is applied. They are placed after the closing slash in literal syntax, or as the second argument in the constructor.
i makes the search case-insensitive. g finds all matches in the string, not just the first. m makes ^ and $ match the start and end of each line, not just the whole string.
const regex = /hello/gi; // case-insensitive, global
Character Classes
Character classes match a set of characters at a single position.
\d matches any digit, equivalent to [0-9]. \w matches any word character: letters, digits, and underscores, equivalent to [a-zA-Z0-9_]. \s matches any whitespace character including spaces, tabs and newlines. \D, \W, \S are the negations of the above. . matches any single character except a newline.
const digits = /\d+/;
'Order 42'.match(digits); // ['42']
Quantifiers
Quantifiers specify how many times a character or group must appear.
* matches zero or more occurrences. + matches one or more occurrences. ? matches zero or one occurrence, making the preceding element optional. {n} matches exactly n occurrences. {n,m} matches between n and m occurrences.
By default, quantifiers are greedy: they match as many characters as possible. Adding ? after a quantifier makes it lazy, matching as few characters as possible.
/\d{4}/.test('2021'); // true
/\d{4}/.test('202'); // false
Anchors
Anchors match a position in the string rather than a character.
^ matches the start of the string. $ matches the end of the string.
/^hello/.test('hello world'); // true
/^hello/.test('say hello'); // false
/world$/.test('hello world'); // true
Groups
Parentheses create a capturing group. The matched content of a group is extracted separately from the full match.
const date = '2021-08-16';
const match = date.match(/(\d{4})-(\d{2})-(\d{2})/);
console.log(match[1]); // '2021'
console.log(match[2]); // '08'
console.log(match[3]); // '16'
Methods
Regular expressions work with the following methods:
test() returns true or false, indicating whether the pattern was found.
/\d+/.test('abc123'); // true
match() returns an array of matches, or null if none are found.
'hello world'.match(/\w+/g); // ['hello', 'world']
replace() replaces the matched portion with a new string.
'hello world'.replace(/world/, 'JS'); // 'hello JS'
split() divides a string using a pattern as the delimiter.
'one,two,,three'.split(/,+/); // ['one', 'two', 'three']
What to Do Now
Open your browser console and test a pattern against an email address structure:
const emailPattern = /^\w+@\w+\.\w+$/;
console.log(emailPattern.test('user@example.com')); // true
console.log(emailPattern.test('notanemail')); // false
Decompose the pattern: ^ anchors the start, \w+ matches the username, @ is a literal character, \w+ matches the domain, \. matches a literal dot, \w+ matches the extension, and $ anchors the end.