Regular Expressions - freeCodeCamp.org

How to Parse S-expressions in JavaScript

Jakub T. Jankiewicz — Thu, 04 Apr 2024 22:22:06 +0000

S-expressions are the base of the Lisp family of programming languages. In this article, I will show you how to create a simple S-expression parser step by step. This can be a base for the Lisp parser.

Lisp is the easiest language for implementation, and creating a parser is the first step. We can use a parser generator for this, but it's easier to write the parser yourself. We'll use JavaScript.

What are S-expressions?

If you're not familiar with the Lisp language, S-expressions look like this:

(+ (second (list "xxx" 10)) 20)

This is a data format, where everything is created from atoms or lists surrounded with parenthesis (where atoms of other lists are separated by spaces).

S-expressions can have different data types, just like JSON:

numbers
strings
symbols – which are like strings but without quotes – can be interpreted
as variable names from different languages.

Additionally, you can use a special dot operator that creates a pair.

(1 . b)

You can represent a list as doted pairs (which indicates that they are in fact a linked list data structure).

This list:

(1 2 3 4)

Can be written as:

(1 . (2 . (3 . (4 . nil))))

nil is the special symbol that indicates the end of the list of an empty list. With this format, you can create any binary tree. But we'll not use this doted notation in our parser so we don't complicate things.

What are S-expressions Used For?

Lisp code is created from S-expressions, but you can also use it as a data exchange format.

They are also part of text representation of WebAssembly. Probably because of the simplicity of the parser, and that you don't need to come up with your format. You can use them for the communication between server and browser, instead of JSON.

How to Implement S-expression Parser in JavaScript

Tokenizer

The tokenizer is part of the parser that splits the text into tokens that then can be parsed.

Usually, a parser is accompanied by Lexer or a tokenizer that generates the tokens.
This is how some parser generators work (like lex and Yacc or flex and bison. The second one is the free and open source software, part of the GNU project).

The simplest way of tokenizing is to use regular expressions. If you're not familiar with regular expressions (or Regex for short) you can read this article:
A Practical Guide to Regular Expressions – Learn Regex with Real Life Examples.

This is the simplest way of tokenization:

'(foo bar (baz))'.split(/(\(|\)|\n|\s+|\S+)/);

This is a union (with a pipe operator) or different cases we need to handle. Parentheses are special characters in Regex, so they need to be escaped by a slash.

It almost works. The first problem is that there are empty strings between the regex matching. Like this expression:

'(('.split(/(\(|\)|\n|\s+|\S+)/);
// ==> [ '', '(', '', '(', '' ]

We have 5 tokens instead of 2. We can solve this problem with an Array::filter.

'(('.split(/(\(|\)|\n|\s+|\S+)/).filter(token => token.length);
// ==> ["(", "("]

If the token is empty, the length will return 0 and will be converted to false, which means that it will filter out all empty strings.

We'll also not need spaces, so we can also filter them out:

'(   ('.split(/(\(|\)|\n|\s+|\S+)/).filter(token => token.trim().length);
// ==> ["(", "("]

The second bigger problem is with baz)) as the last token, here is an example:

'(foo bar (baz))'.split(/(\(|\)|\n|\s+|\S+)/).filter(token => token.trim().length);
// ==> ["(", "foo", "bar", "(", "baz))"]

The problem is the expression \S+, which is greedy and matches everything that is not a space. To fix the problem, we can use this expression: [^\s()]+.

It will match everything that is not a space and not a parentheses (same as \S+ but
with parentheses).

(foo bar (baz))'.split(/(\(|\)|\n|\s+|[^\s()]+)/).filter(token => token.trim().length);
// ==> ["(", "foo", "bar", "(", "baz", ")", ")"]

As you can see, the output is correct. Let's write this tokenizer as a function:

const tokens_re = /(\(|\)|\n|\s+|[^\s()]+)/;
function tokenize(string) {
    string = string.trim();
    if (!string.length) {
        return [];
    }
    return string.split(tokens_re).filter(token => token.trim());
}

We don't need to use length after token.trim() because an empty string is also converted to false value and the filter will remove those values.

But what about string expressions (those in quotes)? Let's see what will happen:

tokenize(`(define (square x)
            "Function calculate square of a number"
            (* x x))`);
// ==> ["(", "define", "(", "square", "x", ")", "\"Function", "calculate", "square",
// ==>  "of", "a", "number\"", "(", "*", "x", "x", ")", ")"]

NOTE: This is a function in the Scheme dialect of Lisp. We used template literals so we could add newline characters inside the Lisp code.

As you can see from the output, the single stings are all split by spaces. Let's fix that:

Regular Expressions for Strings

We need to add string literals as an exception to our tokenizer. The best is the first item in the union in our regex.

The expression that handles string literals looks like this:

/"[^"\\]*(?:\\[\S\s][^"\\]*)*"/

It handles escaped quotes inside a string.

This is how the full regular expression should look like:

const tokens_re = /("[^"\\]*(?:\\[\S\s][^"\\]*)*"|\(|\)|\n|\s+|[^\s()]+)/;

NOTE: We can also add Lisp comments, but because this is not a Lisp parser but S-expression,⁣ we'll not do that here. JSON doesn't support comments as well. If you want to create a Lisp parser, you can add them as an exercise.

Our tokenizer now works correctly:

tokenize(`(define (square x)
            "Function calculate square of a number"
            (* x x))`);
// ==> ["(", "define", "(", "square", "x", ")",
// ==>  "\"Function calculate square of a number\"",
// ==>  "(", "*", "x", "x", ")", ")"]

Parser

We'll create our parser using a stack data structure (LIFO - Last In First Out).

To fully understand how the parser works, it is good to know about data structures, like linked lists, binary trees, and stacks.

Here is the first version of our parser:

function parse(string) {
    const tokens = tokenize(string);
    const result = []; // as normal array
    const stack = []; // as stack
    tokens.forEach(token => {
        if (token == '(') {
            stack.push([]); // add new list to stack
        } else if (token == ')') {
            if (stack.length) {
                // top of the stack is already constructed list
                const top = stack.pop();
                if (stack.length) {
                    // add constructed list to previous list
                    var last = stack[stack.length - 1];
                    last.push(top);
                } else {
                    result.push(top); // fully constructed list
                }
            } else {
                throw new Error('Syntax Error - unmached closing paren');
            }
        } else {
            // found atom add to the top of the stack
            // top is used as an array we only add at the end
            const top = stack[stack.length - 1];
            top.push(token);
        }
    });
    if (stack.length) {
        throw new Error('Syntax Error - expecting closing paren');
    }
    return result;
}

The function returns an array of our structures in the form of arrays. If we need to parse more than one S-expressions, we will have more items in an array:

parse(`(1 2 3) (1 2 3)`)
// ==> [["1", "2", "3"], ["1", "2", "3"]]

Although we don't need to handle dots, S-expressions can be in this form:

((foo . 10) (bar . 20))

We don't need to create a special structure for our lists to have a working parser. But it's a good idea to have this structure from the beginning (so you can use this as a base for a Lisp interpreter). We will use a Pair class, so we'll be able to create any binary tree.

class Pair {
    constructor(head, tail) {
        this.head = head;
        this.tail = tail;
    }
}

We will also need something that will represent the end of the list (or an empty list). In Lisp language, it's usually nil:

class Nil {}
const nil = new Nil();

We can create a static method that will convert an array into our structure:

class Pair {
    constructor(head, tail) {
        this.head = head;
        this.tail = tail;
    }
    static fromArray(array) {
        if (!array.length) {
            return nil;
        }
        let [head, ...rest] = array
        if (head instanceof Array) {
            head = Pair.fromArray(head);
        }
        return new Pair(head, Pair.fromArray(rest));
    }
}

To add this to our parser, all we have to do is to add it at the end:

result.map(Pair.fromArray);

NOTE: If you would like to add a dot operator later, you will need to create pairs by hand, inside the parser.

We didn't convert the whole array because this will be the container for our S-expressions. Each element in an array should be a list, that's why we used Array::map.

Let's see how it works:

parse('(1 (1 2 3))')

The output will be a structure like this (this is the output of JSON.stringify with inserted value of nil).

{
    "head": "1",
    "tail": {
        "head": {
            "head": "1",
            "tail": {
                "head": "2",
                "tail": {
                    "head": "3",
                    "tail": nil
                }
            }
        },
        "tail": nil
    }
}

The last thing that we can add is to stringify the List, by adding a toString method to our Pair class:

class Pair {
    constructor(head, tail) {
        this.head = head;
        this.tail = tail;
    }
    toString() {
        const arr = ['('];
        if (this.head) {
            const value = this.head.toString();
            arr.push(value);
            if (this.tail instanceof Pair) {
                // replace hack for the nested list
                // because the structure is a tree
                // and here tail is next element
                const tail = this.tail.toString().replace(/^\(|\)$/g, '');
                arr.push(' ');
                arr.push(tail);
            }
        }
        arr.push(')');
        return arr.join('');
    }
    static fromArray(array) {
        // ... same as before
    }
}

Let's see how it works:

parse("(1 (1 2 (3)))")[0].toString()
// ==> "(1 (1 2 (3)))"

The last problem is that the output structure doesn't have numbers. Everything is a string.

Parsing of Atoms

We'll use the regular expressions below:

const int_re = /^[-+]?[0-9]+([eE][-+]?[0-9]+)?$/;
const float_re = /^([-+]?((\.[0-9]+|[0-9]+\.[0-9]+)([eE][-+]?[0-9]+)?)|[0-9]+\.)$/;
if (atom.match(int_re) || atom.match(float_re)) {
    // in javascript every number is float but if it's slow you can use parseInt for int_re
    return parseFloat(atom);
}

Next, we can parse strings. Our strings are almost the same as those in JSON, the only difference is that they can have newlines (this is usually how strings are handled in Lisp dialects). So we can use JSON.parse and only replace \n with \\n (escape the new line).

if (atom.match(/^".*"$/)) {
   return JSON.parse(atom.replace(/\n/g, '\\n'));
}

So with this, we can have all escape characters for free (that is: \t or Unicode characters \u).

The next element of S-expressions are symbols. They are any character sequences that are not numbers or strings. We can create an LSymbol class, to distinguish from Symbol from JavaScript.

class LSymbol {
    constructor(name) {
        this.name = name;
    }
    toString() {
        return this.name;
    }
}

The function for parsing atoms can look like this:

function parseAtom(atom) {
    if (atom.match(int_re) || atom.match(float_re)) { // numbers
        return parseFloat(atom);
    } else if (atom.match(/^".*"$/)) {
       return JSON.parse(atom.replace(/\n/g, '\\n')); // strings
    } else {
       return new LSymbol(atom); // symbols
    }
}

Our parser function with add the parseAtom:

function parse(string) {
    const tokens = tokenize(string);
    const result = [];
    const stack = [];
    tokens.forEach(token => {
        if (token == '(') {
           stack.push([]);
        } else if (token == ')') {
           if (stack.length) {
               const top = stack.pop();
               if (stack.length) {
                  const last = stack[stack.length - 1];
                  last.push(top);
               } else {
                  result.push(top);
               }
           } else {
               throw new Error('Syntax Error - unmached closing paren');
           }
        } else {
           const top = stack[stack.length - 1];
           top.push(parseAtom(token)); // this line was added
        }
    });
    if (stack.length) {
        throw new Error('Syntax Error - expecting closing paren');
    }
    return result.map(Pair.fromArray);
}

We can also improve the toString method on Pair to use JSON.stringify for strings to distinguish from symbols:

class Pair {
    constructor(head, tail) {
        this.head = head;
        this.tail = tail;
    }
    toString() {
        const arr = ['('];
        if (this.head) {
            let value;
            if (typeof this.head === 'string') {
                value = JSON.stringify(this.head).replace(/\\n/g, '\n');
            } else {
                // any object including Pair and LSymbol
                value = this.head.toString(); 
            }
            arr.push(value);
            if (this.tail instanceof Pair) {
                // replace hack for the nested list because
                // the structure is a tree and here tail
                // is next element
                const tail = this.tail.toString().replace(/^\(|\)$/g, '');
                arr.push(' ');
                arr.push(tail);
            }
        }
        arr.push(')');
        return arr.join('');
    }
    static fromArray(array) {
        // ... same as before
    }   
}

And this is a whole parser. What's left are true and false values (and maybe null), but they are left as an exercise for the reader. The full code can be found on GitHub.

Different Approaches to Lisp parser in JavaScript

The above code is good for simple Lisp implementation. I used a similar code as the initial implementation of LIPS Scheme, which can still be found on CodePen.

Right now, LIPS uses a more advanced Lexer (using state machine) instead of a tokenizer. The Lexer was rewritten because the approach with stack was too difficult to modify.

NOTE: This article first appeared on the Polish blog Głównie JavaScript (ang. Mostly JavaScript), the article was titled: Parser S-Wyrażeń (języka LISP) w JavaScript.

A Practical Guide to Regular Expressions – Learn RegEx with Real Life Examples

Tasnim Ferdous — Tue, 01 Aug 2023 20:42:27 +0000

What are Regular Expressions?

Regular expressions, also known as regex, work by defining patterns that you can use to search for certain characters or words inside strings.

Once you define the pattern you want to use, you can make edits, delete certain characters or words, substitute one thing for another, extract relevant information from a file or any string that contains that particular pattern, and so on.

Why Should You Learn Regex?

Regex let you to do text processing in a way that can save you a lot of time. It can also introduce some fun in the process.

Using regex can make locating information much easier. Once you find your target, you can batch edit/replate/delete or whatever processing you need to do.

Some practical examples of using regex are batch file renaming, parsing logs, validating forms, making mass edits in a codebase, and recursive search.

In this tutorial, we're going to cover regex basics with the help of this site. Later on, I will introduce some regex challenges that you'll solve using Python. I'll also show you how to use tools like sed and grep with regex.

Like many things in life, regular expressions are one of those things that you can only truly understand by doing. I encourage you to play around with regex as you are going through this article.

Regex Basics
How to use regex with command line tools
- Recursive regex search with grep
- Substitution with sed
Advanced Regex: Lookarounds
- Lookbehinds
- Lookaheads
Practical Examples of Regex
Final words

Regex Basics

A regular expression is nothing but a sequence of characters that match a pattern. Besides using literal characters (like 'abc'), there are some meta characters (*,+,? and so on) which have special purposes. There are also features like character classes which can help you simplify your regular expressions.

Before writing any regex, you'll need to learn about all the basic cases and edge cases for the pattern you are looking for.

For instance, if you want to match 'Hello World', do you want the line to start with 'Hello' or can it start with anything? Do you want exactly one space between 'Hello' and 'World' or there can be more? Can other characters come after 'World' or should the line end there? Do you care about case sensitivity? And so on.

These are the kind of questions you must have the answer to before you sit down to write your regex.

Exact match

The most basic form of regex involves matching a sequence of characters in a similar way as you can do with Ctrl-F in a text editor.

On the top you can see the number of matches, and on the bottom an explanation is provided for what the regex matches character by character.

Character set

Regex character sets allow you to match any one character from a group of characters. The group is surrounded by square brackets [].

For example, t[ah]i matches "tai" and "thi". Here 't' and 'i' are fixed but between them can occur 'a' or 'h'.

Match ranges in regex

Sometimes you may want to match a group of characters which are sequential in nature, such as any uppercase English letter. But writing all 26 letters would be quite tedious.

Regex solves this issue with ranges. The "-" acts as a range operator. Some valid ranges are shown below:

Range	Matches
[A-Z]	uppercase letters
[a-z]	lowercase letters
[0-9]	Any digit

You can also specify partial ranges, such as [b-e] to match any of the letters 'bcde' or [3-6] to match any of the numbers '3456'.

You are not limited to specifying only one range inside a character set. You can use multiple ranges and also combine them with any other additional character(s). Here, [3-6u-w;] will match any of '3456uvw' or semicolon ';'.

Match any character not in the set

If you prefix the set with a '^', the inverse operation will be performed. For example, [^A-Z0-9] will match anything except uppercase letters and digits.

Character classes

While writing regex, you'll need to match certain groups such as digits quite often and multiple times in the same expression as well.

So for example, how would you match a pattern like 'letter-digit-letter-digit'?

With what you've learned up until now, you can come up with [a-zA-Z]-[0-9]-[a-zA-z]-[0-9]. This works, but you can see how the expression can get quite messy as the pattern length gets bigger.

To make the expression simpler, classes have been assigned to well-defined character groups such as digits. The following table shows these classes and their equivalent expression with character sets:

Class	Matches	Equivalent expression
.	anything except newline	[^\n\r]
\w	word character	[a-zA-Z0-9_]
\W	non-word character	[^\w]
\d	digits	[0-9]
\D	non-digits	[^\d]
\s	space, tab, newlines	[ \t\r\n\f]
\S	non whitespace characters	[^\s]

Character classes are quite handy and make your expressions much cleaner. We will use them extensively throughout this tutorial, so you can use this table as a reference point and come back here if you forget any of the classes.

Most of the time, we won't care about all the positions in a pattern. The "." class saves us from writing all possible characters in a set.

For example, t.. matches anything that starts with t and any two characters afterwards. This may remind you of the SQL LIKE operator which would use t%% to accomplish the same thing.

Quantifiers

The word "pattern" and "repetition" go hand in hand. If you want to match a 3 digit number you can use \d\d\d. But what if you need to match 11 digits? You could write '\d' 11 times, but a general rule of thumb while writing regex or just doing any kind of programming is that if you find yourself repeating something more than twice, you are probably unaware of some feature.

In regex, you can use quantifiers for this purpose. To match 11 digits, you can simply write the expression \d{11}.

The table below lists the quantifiers you can use in regex:

Quantifier	Matches
*	0 or more
?	0 or 1
+	1 or more
{n}	exactly n times
{n, }	n or more times
{n, m}	n to m times inclusive

In this example, the expression can\s+write matches can followed by 1 or more whitespaces followed by write. But you can see 'canwrite' is not matched as \s+ means at least one whitespace needs to be matched. This is useful when you are searching through text which is not trimmed.

Can you guess what can\s?write will match?

Capture groups

Capture groups are sub-expressions enclosed in parentheses (). You can have any number of capture groups, and even nested capture groups.

The expression (The ){2} matches 'The ' twice. But without a capture group, the expression The {2} would match 'The' followed by 2 spaces, as the quantifier will be applied on the space character and not on 'The ' as a group.

You can match any pattern inside capture groups as you would with any valid regex. Here (is\s+){2} matches if it finds 'is' followed by 1 or more spaces twice.

How to use logical OR in regex

You can use "|" to match multiple patterns. This is (good|bad|sweet) matches 'This is ' followed by any of 'good' or 'bad' or 'sweet'.

Again, you must understand the importance of capture groups here. Think about what the expression This is good|bad|sweet would match?

With a capture group, good|bad|sweet is isolated from This is. But if it's not inside a capture group, the entire regex is only one group. So the expression This is good|bad|sweet will match if the string contains 'This is good' or 'bad' or 'sweet'.

How to reference capture groups

Capture groups can be referenced in the same expression or while performing replacements as you can see on the Replacement tab.

Most tools and languages allow you to reference the nth captured group with '\n'. In this site '$n' is used while referencing on replacement. The syntax for replacement will vary depending on the tools or language you're using. For JavaScript, for example, its '$n', while for Python its '\n'.

In the expression (This) is \1 power, 'This' is captured and then referenced with '\1', effectively matching This is This power.

How to name capture groups

You can name your capture groups with the syntax (?pattern) and backreference them in the same expression with \k.

On replacement, referencing is done by $. This is the syntax for JavaScript and can vary among languages. You can learn about the differences here. Also note that this feature might not be available in some languages.

In the expression (?[\w+]+) is the best but \k .*, the pattern [\w+]+ is captured with the name 'lang' and backreferenced with \k. This pattern will match any word character or '+' character 1 or more times. The .* at the end of the regex matches any character 0 or more times. And finally on replacement, the referencing is done by $.

How to Use Regex with Command Line Tools

There are good CLI tools available that let you perform regex from your terminal. These tools save you even more time as you can easily test different regex without writing code in some langauge and then compiling or interpreting it.

Some of the well-known tools are grep, sed, and awk. Let's look at a few examples to give you some ideas on how you can leverage these tools.

Recursive regex search with grep

You can execute the power of regex through grep. Grep can search patterns in a file or perform recursive search.

If you are on Windows, you can install grep using winget. Run this command in powershell:

winget install -e --id GnuWin32.Grep

I will show you the solution to a challenge I created for a CTF competition at my university.

The file attached to the challenge is a zip file that contains multiple levels of directories and a lot of files in it. The name of the competition was Coderush with flag format coderush{flag is here}. So you have to search for the pattern coderush{.*} which will match the flag format coderush{any character here}.

Unzip the file with unzip ripG.zip and cd into it with cd ripG.

There are 358 directories and 8731 files. Instead of searching the pattern in the files one by one, you can employ grep like this:

grep --color -R "coderush{.*}"

The "-R" flag enables recursive search.

You can learn more about grep and its command line options here

Substitution with sed

You can use sed to perform insertion, deletion, substitution on text files by specifying a regex. If you are on windows, you can get sed from here. Or if you use WSL, tools like grep and sed will already be available.

This is the most common usage of sed:

sed 's/pattern/replacement/g' filename
echo "${text}" | sed 's/pattern/replacement/g'

Here, the option "g" is specified to replace all occurrences.

Some other useful options are -n to suppress the default behaviour of printing all lines and using p instead of g to print only the lines which are affected by the regex.

Let's take a look at the content of texts.txt.

Hello rand chars World 56 rand chars
Henlo 52 rand chars W0rld rand chars
GREP rand chars Henlo 62 rand chars
Henlo 10 rand chars Henlo rand chars
GREP rand chars Henlo 45 rand chars

Our task is replacing Henlo number with Hello number only in the lines where "GREP" is present. So, we are searching for the pattern Henlo ([0-9]+) which will match 'Henlo ' followed by 1 or more digits and all the digits are captured. Then our replacement string will be Hello \1 – the '\1' is referencing the capture group containing the digits.

One way to accomplish that would be using grep to grep the lines which have "GREP" present then perform the replacement with sed.

grep "GREP" texts.txt | sed -En 's/Henlo ([0-9]+)/Hello \1/p'

The "-E" option enables extended regex without which you would need to escape the parentheses.

Or you could just use sed. Use /pattern/ to restrict substitution on only the lines where pattern is present.

sed -En '/GREP/ s/Henlo ([0-9]+)/Hello \1/p' texts.txt

Advanced Regex: Lookarounds

Lookaheads and Lookbehinds (together known as lookarounds) are features of regex that allow you to check the existence of a pattern without including it in the match.

You can think of them as zero width assertions – they assert the existence of a pattern but do not consume any characters in the match. These are very powerful features, but they're also computationally expensive. So make sure you keep an eye on performance if you are using them often.

Lookbehinds

Let's say you want to match the word 'linux', but you have 2 conditions.

The word 'GNU' must occur before 'linux' occurs. If a line contains 'linux' but doesn't have 'GNU' before it, we want to discard that line.
We want to match only linux and nothing else.

We already know how to satisfy the 1st condition. GNU.* will match 'GNU' followed by any number of characters. Then finally we match the word linux. This will match all of GNU-any-characters-linux.

But how do we prevent matching GNU.* while still maintaining the 1st condition?

That's where a positive lookbehind comes in. You can mark a capture group as a positive lookbehind by prefixing it with ?<=. In this example, the expression becomes (?<=GNU.*)linux.

Now only linux is matched and nothing else.

Note that the expressions (?<=GNU.*)linux and linux(?<=GNU.*) will behave exactly the same. In the 2nd expression, although linux is before the lookbehind, there is .* after 'GNU' which matches linux. This means it satisfies the lookbehind.

To make it simpler, think about the pattern without the lookbehind. The pattern GNU.* will match 'GNU' and anything after it, in our case matching linux.

Now we can derive a generalized statement that the expression (?<=C)X will match the pattern X – only if pattern C came before X (and C must not be included in the match).

You can also reverse the 1st condition. Match lines that contains the word linux only if GNU never came before it. This is called a negative lookbehind. The prefix in this case is ?. The inverse of the previous expression would be (?.




Lookaheads
Lookaheads are also assertions like lookbehinds, as you saw in the previous example. The only difference is that lookbehinds make an assertion before and lookaheads makes assertion after.
Let's say you have these two conditions:

Match Hello only if World comes somewhere after it.
Match only Hello and nothing else.

The prefix for a positive lookahead is ?=. The expression Hello(?=.*World) will meet both conditions. This is similar to Hello.*World except that only Hello will be matched whereas Hello.*World will match 'Hello', 'World' and anything in between.

Similar to the example in a positive lookbehind, the expressions Hello(?=.*World) and (?=.*World)Hello are equivalent. Because the .* before 'World' matches Hello, satisfying the 1st condition.
A negative lookahead is just the complement of a negative lookbehind. You can use it by prefixing it with ?!. (?!World)Hello will match Hello only if there is no World anywhere after it.

Here is a summary of the syntax for lookarounds when you want to match the pattern X with assertion C.




Operation RegEx



positive lookahead (?=C)X

negative lookahead (?!C)X

positive lookbehind (?<=C)X

negative lookbehind (?



Practical Examples of Regex

Logs parsing
In this log file, these are the lines which we care about:
[1/10000] Train loss: 11.30368, Valid loss: 8.95446, Elapsed_time: 7.58941
[500/10000] Train loss: 0.96180, Valid loss: 0.20098, Elapsed_time: 82.48651
[1000/10000] Train loss: 0.04051, Valid loss: 0.11927, Elapsed_time: 156.86243
Our task is to extract the training loss and validation loss for purposes such as plotting loss over the epochs. We need to extract the training loss values like 11.30368, 0.96180, 0.04051 and put them in an array.
All the relevant values are prefixed with 'Train loss:', so we can use this in our regex as it is. To match the float numbers we have to match some digits followed by a "." and then followed by more digits. You can do this with \d+\.\d+. Because we want to keep track of these numbers, they should be inside a capture group.
As "." has special purpose in regex, when you want to match a "." character you have to escape it with a backslash. This is applicable for all characters with a special purpose. But you dont have to escape it inside a character set.
Putting it altogether, the expression for extracting training loss is Train loss: (\d+\.\d+). We can use the same logic to extract validation loss with Valid loss: (\d+\.\d+).
Here is one way to extract this information using Python:
import re

f = open("log_train.txt", "r").read()

train_loss = re.findall(r'Train loss: (\d+\.\d+)', f)
valid_loss = re.findall(r'Valid loss: (\d+\.\d+)', f)

train_loss = [float(i) for i in train_loss]
valid_loss = [float(i) for i in valid_loss]

print("train_loss =", train_loss)
print("")
print("valid_loss =", valid_loss)

When there is one capture group, re.findall searches all the lines and returns the values inside the capture group in a list. 
Any regex function only return strings, so the values are converted to floats and printed out. Then you can directly use them in another Python script as a list of floats.
This is the result:

You could also use sed, save the output in train_losses.txt, and read from the file. First we use '/Train/' to target only the lines with 'Train' present then we are applying the same regex as before.
sed -En '/Train/ s/.*Train loss: ([0-9]+\.[0-9]+).*/\1/p' log_train.txt | tee train_losses.txt

".*" is added at the start and end so that sed matches the contents of all the relevant lines. Then the entire line is replaced by the value of the capture group. The tee command is used to redirect the output of sed into train_losses.txt while also printing the contents in the terminal.

Take a moment to think about what would you need to extract the epochs. You have to extract 500 from [500/10000] for all such lines. The array should look like [1, 500, 1000, 1500, ...]. You can follow the same approach as we used for the previous example. 
Note that if you want to match "[" or "]", you have to escape it. The answer is given here.

Bulk File Renaming
You have these files with some random values as prefixes. You have to rename all files as 1.mp4, 2.mp4 and so on.
This is how the files were generated.

This is a common scenario where you have a list of files which have their sequence number in the name but there are also some other characters that you don't want.
The pattern has to match anything up to Episode then an underscore and then the number and .mp4 at the end. 
The relevant value is the number before '.mp4' which we will put inside a capture group. .*Episode_ will match everything up to the number. Then we can capture the number with ([0-9]+) and also match .mp4 with \.mp4. 
So the final regex is .*Episode_([0-9]+)\.mp4. As we want to keep the .mp4 the replacement string will be \1.mp4.
This is one way to solve it using sed.
for i in *.mp4; do
    newname=$(echo $i | sed -En 's/.*Episode_([0-9]+)\.mp4/\1.mp4/p')
    mv $i $newname
done;ls

First the new name is saved in a variable and then the mv command is used to rename the file.

Could we have just used .* in place of .*Episode_ ? In this example, yes. But there might be filenames such as Steins_Gate0.mp4 where the 0 is part of the movie name and you didn't really want to rename this file so its always better to be as specific as possible.
What if some files were named as "Random_Episode6.mp4"? The difference being, there is no underscore after Episode. What change will you need to make?
The answer is that you'll need to add a "?" after the "_" to make it optional. The regex will be .*Episode_?([0-9]+)\.mp4.

Email validation
There are all sorts of complicated regex for validating email.
Here is a simple one: ^[^@ ]+@[^@.]+\.\w+$. It matches the format A@B.C
The table below breaks down this pattern into smaller pieces:




Pattern Matches



^ start of line

[^@ ]+ anything except "@" and space character

@[^@.]+ @ followed by anything except "@" and "." characters

\.\w+ "." followed by word characters

$ end of line



In the regexr site, you can enable the multline flag from the Flags tab in the upper right corner. The 'gm' at the end indicates that the multiline flag is enabled.
We can see that line 2,3,5,6 didn't match. Can you find out the reason and which part of the regex is responsible for disqualifying it?
The answer is given here

Password constraints
You can also use regex to impose constraints. Here we will uncover the power of positive lookaheads. 
Lets say we want to accept a string only if there is a digit in it. You already know how to find a digit with the '\d' class. To accomplish that, we can use [^\d]*\d. This will match any non-digit character 0 or more times and then match a digit. 
We can also use the expression .*\d to match one digit. So if there is no digit in the string then the lookahead will fail and the none of the characters of that string will be matched, returning an empty string "". 
When we are using a programming language, we can check if the regex returned an empty string and determine that the constraints are not satisfied.
We will create a regex which imposes the following criteria:

Minimum 8 characters and maximum 16 characters.
At least one lower case letter.
At least one upper case letter.
At least one number.

To achieve this, you can use positive lookaheads. This is the regex:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,16}$
The table below explains which part of the regex imposes which constraint:




Pattern Imposed Constraint



.{8,16} min 8 and max 16 characters

(?=.*[a-z]) minimum one lower case letter

(?=.*[A-Z]) minimum one upper case letter

(?=.*\d) minimum one digit



What modification you would need for imposing at least 5 upper case letters?
You may think (?=.*[A-Z]{5,}) will do the job. But this expression requires all the 5 letters to be together. A string like rand-ABCDE-rand will be matched but 0AxBCDxE0 will not be matched even though it has 5 upper case letters (as they are not adjacent).
Yet again, we have capture groups coming to the rescue. We want to match 5 uppercase letters anywhere in the string. We already know that we can match 1 uppercase letter with .*[A-Z]. Now we will put them inside a capture group and attach a quantifier of minimum 5. The expression will be (.*[A-Z]){5,}.
Here is the final answer:
In place of (?=.*[A-Z]) you will need (?=(.*[A-Z]){5,}). The expression becomes ^(?=.*[a-z])(?=(.*[A-Z]){5,})(?=.*\d).{8,16}$.

You could also require that the password not contain certain words to enforce stronger passwords. 
For example, we want to reject the password if contains pass or 1234. Negative lookaheads is the tool for this job. The regex would be ^(?!.*(pass|1234)).*$.

In this regex, we put pass and 1234 inside a capture group and used the logical OR operator. This capture group is nested inside another capture group which is prefixed with ?!.*. This makes it a negative lookahead that matches if there are at least 8 characters by .{8,} with the condition that, pass or 1234 can't be present anywhere in the string.

Final Words
I hope you got a good amount of practice while going through this article. It's ok if you forget some syntax. What's important is understanding the core concepts and having a good idea of what's possible with regex. Then, if you forget a pattern, you can just google it or reference a cheatsheet. 
The more you practice, the more you will get by without outside help. Eventually you will be able write super complex and effective regex completely offline. 
There are already some good regex cheatsheets out there, so I wanted to create something more in-depth here that you can reference for the core concepts and common use cases. 
If you're looking for a cheatsheet, the one from QuickRef is helpful. It's a good place to recall the syntax and they also provide some basic overview of regex related functions in various programming languages.
Most regex techniques are the same in all programming languages and tools – but certain tools might offer additional features. So do some research on the tool you are using to pick the best one for you.
My final suggestion would be not to force using regex just because you can. A lot of the times a regular string.find() is enough to get the job done. But if you live in the terminal, you really can do a lot just with regex for sure.
If you like this type of article, you may keep an eye on my blog or twitter.

Operation	RegEx
positive lookahead	`(?=C)X`
negative lookahead	`(?!C)X`
positive lookbehind	`(?<=C)X`
negative lookbehind	`(?`

Pattern	Matches
`^`	start of line
`[^@ ]+`	anything except "@" and space character
`@[^@.]+`	@ followed by anything except "@" and "." characters
`\.\w+`	"." followed by word characters
`$`	end of line

Pattern	Imposed Constraint
`.{8,16}`	min 8 and max 16 characters
`(?=.*[a-z])`	minimum one lower case letter
`(?=.*[A-Z])`	minimum one upper case letter
`(?=.*\d)`	minimum one digit



 The Regular Expressions Book – RegEx for JavaScript Developers [Full Book] 
Kolade Chris — Wed, 26 Jul 2023 15:27:12 +0000
 If you want to master regular expressions and understand how they work in JavaScript, this book's for you.
Regular expressions can be intimidating when you first encounter them. When I started learning to code, I gave up on regular expressions twice.
While that was partly because I was intimidated by regular expressions at first, the tutorials and courses I used never taught them in a way I could understand. 
In fact, before some tutorials start teaching regex, they complain about regex and how tough they can be. And there's no better way to discourage a learner than that.
In this book, you won't just see how to use regex in a regex testing tool like regexpal or regex101. You'll also see how they works in JavaScript. This is what many courses and tutorials tailored for regex in JavaScript lack. As you see how they work using a regex tester, you'll also see how they work in JavaScript.
You can also apply what you learn in this book to other programming languages like Python, PHP, and so on. All you need to do is to know about how the regex engine of that language works. You'll also need to understand the methods and functions the language uses for working with regular expressions.
To get the most out of this book, make sure you read it in order because each chapter builds upon the previous ones. I have also arranged the chapters according to how difficult they are. So, you will find simpler concepts first and more advanced concepts later.
Happy reading!
Table Of Contents

Chapter 1: Introduction to Regular Expressions
What are Regular Expressions?
A Brief History of Regular Expressions
What are the Uses of Regular Expressions?
Flavors of Regular Expressions
Tools for Working with Regular Expressions
Basic Concepts of Regular Expressions


Chapter 2: How to Match Literal Characters and Character Sets in Regular Expressions
What are Literal Characters in Regular Expressions?
How to Match Literal Characters in RegEx Testers
Character Set Matching


Chapter 3: Regular Expressions Flags
The global Flag
The case-insensitive Flag
The multi-line and single-line Flags
The unicode Flags
The sticky Flags


Chapter 4: How to Use Regular Expressions in JavaScript 
How to Create Regular Expressions in JavaScript
Methods of the RegExp() Constructor
Properties of the RegExp() Constructor
String Methods for Working with Regular Expressions
How to Match Literal Characters in JavaScript Regular Expressions
How to Use Character Sets in JavaScript Regular Expressions


Chapter 5: Metacharacters, Quantifiers, Repeated Matches, and Optional Matches
What are Metacharacters?
The Word and Non-word Metacharacters
The Anchor Metacharacters
The Digit and Non-digit Metacharacters
The Square Brackets Metacharacter
The Word Boundary and Non-word Boundary Metacharacters
The Parenthesis Metacharacter
The Space and Non-space Metacharacters
The Pipe Metacharacter
How to Match Repeated Characters With Qunatifiers
How to Specify Match Quantity with the Curly Braces Metacharacter
The Wildcard Metacharacter
Greediness and Laziness in Regular Expressions


Chapter 6: Grouping and Capturing in Regex
How to Reference Captured Groups with Backreferences
How to Use the d Flag and hasIndices Property With Groups


Chapter 7: Lookaround Groups: Lookaheads and Lookbehinds 
What are Lookaround Groups?
What is a Lookahead Group?
What is a Lookbehind Group?


Chapter 8: Regex Best Practices and Troubleshooting
Best Practices to Consider While Writing Regular Expressions
How to Write Accurate Regular Expressions


Chapter 9: Applications of Regular Expressions
A Better Way to Match Dates 
How to Match US Zip Codes
How to Match Email Addresses
How to Match Passwords
Form Validation with Regex
Article Table of Contents Generator


Glossary and References
Glossary of Terms 
Quick Reference of Metacharacters and Quantifiers 



Chapter 1: Introduction to Regular Expressions
What are Regular Expressions?
You might see this written as regular expressions, regex, or RegExp – but all refer to the same thing. 
Regex are a sequence of characters for matching a part of a string or the whole string. Matching strings with regular expressions might require more than just "characters". Many times, you will need to use a special set of characters called "metacharacters" and "quantifiers". 
Because regular expressions are a powerful tool, you can use then to do much more than just "matching strings" when you combine regex with programming languages. 
Almost all the main programming languages of the modern era have built-in support for regular expressions. Some programming languages might even have specific libraries that help you work more conveniently with regex.
Apart from using regular expressions in programming languages, other tools that let you use regular expressions are:

Text Editors and IDEs: for search and search and replace in VS Code, Visual Studio, Notepad++, Sublime Text, and others.

Browser Developer Tools: mostly in-browser search (with extensions or add-ons) and search within the developer tools.

Database Tools: for data mining.

RegEx Testers: you can paste in text and write the regular expressions to match them – which is a very good way to learn regular expressions. This book explores that option quite a bit.


A Brief History of Regular Expressions
Regular expressions have a rich and fascinating history that has already spanned more than seven decades. This history continues to evolve alongside the development of computer science and programming languages.
The concept of regular expressions traces back to the 1950s. American mathematician Stephen Cole Kleene introduced them as a notation for defining patterns in formal languages. Kleene's work also formed the foundation for theoretical computer science.
In the early 1960s, the first implementations of regular expressions emerged. Ken Thompson, a computer scientist at Bell Labs, developed a text editor named QED that utilized regular expressions for pattern matching. QED's capabilities provided a way to search and manipulate texts more efficiently.
The concept gained further popularity when Thompson and Dennis Ritchie created the Unix operating system in the early 1970s. 
They incorporated regular expressions into various Unix utilities, most notably the ed text editor and later the sed stream editor. These tools allowed users to perform complex text manipulation tasks, significantly enhancing the efficiency and power of text processing.
In 1973, Thompson collaborated with Alfred Aho and Peter Weinberger to develop a new tool called grep (global regular expression print) as part of the Unix toolkit. 
Grep allowed users to search files for specific patterns using regular expressions. The simplicity and effectiveness of grep made it a widely adopted tool. It also established regular expressions as a standard feature in Unix-based systems.
As computer systems and programming languages evolved, regular expressions became integrated into various software development environments. In the late 1970s, the AWK programming language was created. AWK inspired Larry Wall to create Perl and make it available to the public in 1987.
Wall recognized the value of regular expressions for text manipulation and integrated regex into Perl. 
Perl's integration of regular expressions into its syntax made it a popular language for text matching and data extraction tasks. This integration formed the foundation of  PCRE (Perl-compatible regular expressions), a flavor and library of regular expressions you can use in some programming languages such as Perl, Python, PHP, Java, and others.
Regular expressions continued to evolve and find applications beyond Unix and Perl. In the 1980s, the International Organization for Standardization (ISO) developed the POSIX standard, which included a specification for regular expressions. This standardization ensured compatibility and consistency across different implementations and systems.
With the rise of the internet and the World Wide Web in the 1990s, regular expressions found widespread use in web development and data processing. They became an essential component of many scripting languages, providing developers with powerful tools for text processing, form validation, and data extraction from web pages. 
For example, JavaScript had always had a version of PCRE built in for working with regular expressions. But by 1999, with the release of ECMAScript, the RegExp() constructor was introduced. This gave JavaScript developers the ability to start using regular expressions directly in their code, in the JavaScript way.
In the early 2000s, tools and libraries specifically focused on regular expressions emerged, making it easier for developers to work with them. Libraries like PCRE (Perl Compatible Regular Expressions) provided enhanced features and better performance, further expanding the usage and capabilities of regular expressions.
Today, regular expressions are an integral part of programming languages and text-processing tools like your code editor. They are supported by almost all major programming languages, including Java, C#, Ruby, and PHP. 
Integrated development environments (IDEs) and code editors like Visual Studio, VS Code, and Notepad++ also now include regex-based search and search and replace functionalities, simplifying the process of finding and manipulating texts in code. 
The history of regular expressions demonstrates their evolution from theoretical concepts to practical tools that have revolutionized text processing and pattern matching. 
From the early developments at Bell Labs and Unix to their integration into popular programming languages, regular expressions have become an essential tool in the hands of developers and system administrators. Regex empowers them to handle complex text-based tasks efficiently. 
With the ongoing advancements in computing and the continuous demand for efficient text processing, regular expressions will likely remain a fundamental part of the technology landscape for years to come.
What are the Uses of Regular Expressions?
Regular expressions are quite versatile and flexible. This makes it possible to apply them to various tasks in various domains such as computer programming, data processing, text editing, and web development.
Those applications and uses include but are not limited to the following:
String Matching: This is one of the most common ways developers use regular expressions. This is also a good way to learn regular expressions. 
You can paste some texts into a regex engine and write the regex to match a part of the text or the whole text. You can also search for strings that contain specific character sequences, start or end with certain characters, or match complex patterns. 
This makes regular expressions valuable for tasks like searching for keywords, validating input against specific patterns, or filtering data based on string patterns
Password Strength Validation: You can use regular expressions for validating the strength of passwords in websites and applications. 
By defining a set of rules using regular expressions, developers can enforce specific password requirements, such as a minimum number of characters, a combination of uppercase and lowercase letters, numbers, and special characters. 
Form Validation: Validating inputs of a form or standalone inputs is another popular way developers use regular expressions. 
Regular expressions provide a concise and efficient way to ensure that input data follows specific patterns or formats. Whether it's validating usernames, email addresses, phone numbers, credit card numbers, postal codes, or other inputs, regular expressions can help you enforce validation rules and maintain data integrity.
Text Search and Manipulation: Regular expressions excel at searching for specific patterns within text and performing manipulations based on those matches. They are a powerful tool for tasks such as data mining, log analysis, and text processing. 
Whether you need to find occurrences of particular words or phrases, extract structured data from text, analyze content, or perform string matching, regular expressions offer efficient pattern-matching capabilities
Working with URLs and URIs: Since URLs and URIs are an integral part of web development, regular expressions can help in validating, parsing, and manipulating them. This enables developers to ensure the correctness and structure of web addresses, validate whether a string is a valid URL, and help extract specific components such as the domain, path, query parameters, or fragments. 
This functionality is particularly useful in tasks like URL routing, rewriting, or extracting data from query parameters.
Search and Replace in IDEs and Text Editors: Regular expressions offer sophisticated search capabilities. This enables developers to locate specific patterns (such as words with specific prefixes or sequences of characters) and then replace the matches with a specified text. This is built into modern text editors like VS Code and Notepad++.
Data Extraction and Scraping: Regular expressions play a significant role in data extraction and web scraping. They allow developers to extract specific information from unstructured or semi-structured text by defining patterns to match desired data. 
They are also valuable when extracting data from sources like HTML or XML documents, as they enable efficient retrieval of information based on defined patterns.
Syntax Highlighting: Regular expressions are commonly used in IDEs and text editors to provide syntax highlighting. This ends up helping users to visually distinguish different parts of a code or document by assigning colors or formatting to keywords, strings, comments, and other language-specific constructs. 
Regular expressions are used to identify and match these language-specific patterns, making code more readable and enhancing the overall editing experience.
Flavors of Regular Expressions
The term "flavors of regular expressions" refers to the specific implementation and syntax variations of regular expressions in different programming languages, libraries, or tools. 
While the core concept of regular expressions remains the same, the details of how regular expressions are written and interpreted can vary between different environments.
Each flavor of regular expressions may have its own set of metacharacters, syntax rules, and additional features beyond the basic functionality. 
These differences can include variations in the syntax for character classes, metacharacters, capturing groups, and assertions, as well as additional capabilities like named capturing groups, look-ahead, and look-behinds.
There are many flavors of regular expressions available today. Some of them are:

Basic Regular Expressions (BRE): this flavor is commonly found in Unix tools such as sed and grep. It uses a limited set of metacharacters and features. The wildcard (.) and zero or more (*) metacharacters are available in it.

Extended Regular Expressions (ERE): ERE is an extension of BRE. It provides additional metacharacters and features. In addition to the metacharacters available in BRE, ERE introduces features like grouping with parentheses ( ( )), alternation with the pipe symbol (|), and the use of curly braces ({}) to specify repetition ranges.

Perl-Compatible Regular Expressions (PCRE): PCRE is a popular flavor supported by various programming languages such as Perl, Python, PHP, and JavaScript. PCRE extends the basic regular expression syntax with powerful features like lookahead and look-behind assertions, backreferences, non-capturing groups, and the use of \b for word boundaries.

JavaScript Regular Expressions: JavaScript has its regular expression flavor which is similar to PCRE but with a few differences. It supports basic features like character classes with square brackets ([ ]), metacharacters (*, +, ?, and others), and capturing groups  (( )). JavaScript also provides additional features like the global flag /g to perform multiple matches, and the ignore case flag /i for case-insensitive matching

Python Regular Expressions: Python's re module implements a flavor that is similar to PCRE but with a few variations. It supports features such as character classes [ ], metacharacters (*, +, and ?), and capturing groups (( )). The re module also has a unique raw string syntax (r' ') to simplify working with backslashes.


It's important to be aware of the flavor of regular expressions you are using when working with regular expressions in different programming languages or tools. This ensures that you use the correct syntax and take advantage of any unique features or capabilities provided by that particular flavor.
N.B.: Don’t bother so much about the metacharacters (and quantifiers) mentioned in this part. You will see them in action in chapter 5 of this book.
Tools for Working with Regular Expressions
Regular expression tools are the programming languages, libraries and frameworks, command line utilities, online regex testers, text editors and IDEs, and applications designed to help you create, test, and apply regular expressions in your day-to-day work life.
There are many tools available for working with regular expressions. Let me take you through them under regex testers, programming languages, libraries, text editors and IDEs, and command line tools.
RegEx Testers
RegEx testers are the online testing environments specifically built for creating and testing regular expressions against some test strings. Examples include regex101.com, regexr.com, and regexpal.com.
The UIs of these regex testers usually have an input for the regular expressions you want to write, and another for the text you want to test the regex against. 
This is how the UI of regexpal.com looks:

More advanced ones like regex101.com let you select the flavor of regular expressions you want to work with, an explanation of the regex, and match information. 
Here’s what the UI of regex101.com looks like:

One of the good things about these online regex testers is that they are helpful for learning regular expressions. A lot of them provide real-time matching and cheatsheets you can quickly look at. Many devs who use regex have used them. 
Apart from learning, you can also use them by creating your regex with them and pasting them into wherever you want to use the regex. This is how I create my regex.
Programming Languages
Almost all modern programming languages have built-in support for regular expressions. And so they all have methods for creating and testing regular expressions. 
For example, JavaScript has the RegExp() constructor for working with regular expressions, Python has the re module, Java has the java.util.regex package, and Perl has regex built into it directly.
Libraries and Frameworks
Many programming languages have standalone libraries and frameworks that make it easier to create regular expressions. 
There is XRegExp for JavaScript, PCRE (Perl Compatible Regular Expressions) for Perl, Go-Restructure for Golang, and Verbal Expressions, a cross-platform regex library.
Text Editors and IDEs
Many text editors and IDEs such as VS Code, Visual Studio, Notpad++, Atom, Sublime Text, IntelliJ IDEA, and others have built-in support for regular expressions. 
The commonest thing developers use this for is search, and search and replace. Also, the syntax highlighting in those text editors and IDEs is often implemented with regular expressions.
Command Line Tools
Unix command line tools like grep and sed allow you to perform regex operations on text files and streams. With this, you can search, filter, and manipulate multiple files. 
Using these Unix tools, options for customizing search behaviors and customizing complex text transformations are also available to you.
Basic Concepts of Regular Expressions
The basic concepts and syntax of regular expressions are the building blocks involved in creating, testing, and applying patterns for searching, matching, and manipulating strings. 
This includes concepts like literal characters, metacharacters, quantifiers, character classes, anchors and boundaries, and escape characters. The more advanced ones are groupings, backreferences, look-ahead assertions, and look-behind assertions.
Regular expressions users utilize many of these concepts to construct efficient regular expressions for working with text. On many occasions, the basic ones are enough. But if you want to create more advanced regular expressions, then the more advanced ones will also be useful for you.
This book won’t leave any of the concepts behind. I will show you how you can utilize them in regex testers and how you can use them in JavaScript since that’s what this book is meant for.
Chapter 2: How to Match Literal Characters and Character Sets in Regular Expressions
What are Literal Characters in Regular Expressions?
Literal characters are characters you can match as they appear in a test string. They could be letters, numbers, spaces, or even symbols. In other words, they are non-special characters that represent themselves.
This means if you want to match literal characters, you should construct your regex pattern in the same way as the test string appears. 
For example, if you want to match the word hello, your regex pattern can be hello. And if you want to match the h in the word hatch, all you need as the pattern is h. 
This h would match the first occurrence of the letter h in the test string hatch. If you want it to match the other letter h as well, you need the "g" flag, or global flag. You will learn about the flags and modifiers in the next chapter of this book.
That is not the case for some symbols, though. That’s because some symbols are special characters of regular expressions (metacharacters and quantifiers). So, if you want to match those characters, you have to escape them with a backslash (\). This book will also teach you all you need to know about metacharacters because there's a whole chapter for them.
How to Match Literal Characters in RegEx Testers
Provided you want to match the word hello, then hello should be your regex pattern:

If you want to match the text freeCodeCamp, you can construct your regex to be freeCodeCamp:

So, what if you want to match hello freeCodeCamp? Then you just use hello freeCodeCamp as the pattern:

If you want to match the letter e in the text freeCodeCamp, e is the pattern to use:

And if you want to match h in the text hatch, h is the pattern you should use:

You can see that in the text freeCodeCamp, the other es after the first occurrence were not returned as matches – same with the last h in the word hatch. You will learn how to match every occurrence of a letter in a text in the next chapter.
Character Set Matching
A character set, also called character class, is a set of characters that will successfully match a certain character in a test string. This set of characters is enclosed in square brackets. 
For instance, the pattern [abc] will match any of a, b, and c, while [xyz] will match any of x, y, and z.
Here are some examples of character sets and what they do:

[abc]: matches either a, b, or c
[aeiou]: matches any vowel character
[a-z]: matches any lowercase letter from a to z
[A-Z]: matches any uppercase letter from A to Z
[0-9]: matches any digit from 0 to 9

Inside the square brackets, you don’t need to escape metacharacters because they lose their special meaning. The only symbol that has a meaning in the square brackets is a hyphen (-), which you can use to specify ranges, as I have done with some examples of character sets. 
You will also learn about ranges in this book. On some occasions, a backslash \ does not lose its special meaning in a character set.
As with literal character matching, only the first occurrence of the character set will return as a match, every other occurrence will be ignored. In the next chapter, you will learn how to match multiple occurrences of a character with the g flag.
Here’s how each of the above character sets works in a regex testing tool:
[abc]:

[aeiou]:

[a-z]:

[A-Z]:

[0-9]:

You can also define your unique character class based on what you want. Character sets are useful when you want to match some characters in a particular position in a text.
For instance, the pattern br[ao]ke will match both brake and broke:

The pattern gr[ae]y will match both gray and grey:

N.B.: I turned on the g flag so you can see all the matches, and how powerful character sets are. We will take a look at the g and other flags in the next chapter.
Since there are always multiple ways of doing the same thing in programming, there are also certain character sets called "shorthand character sets" that you can use instead of character sets. 
Since these shorthand character sets are a subset of metacharacters, you will learn about them under the chapter dedicated to metacharacters.
Chapter 3: Regular Expressions Flags
Also called modifiers, flags are special characters you can place at the end or within a regular expressions pattern to alter its default behavior. 
JavaScript developers tend to refer to these characters as "flags", but in Python they are used interchangeably. 
In Python, you can place flags within a regex pattern, but in JavaScript, flags are always placed at the end of the regex pattern.
Here are the flags you can use in regular expressions:

global flag
case insensitive flag
multi-line flag
single-line flag
unicode flag
sticky flag

In many regex engines, you can turn on any flag you want to use. In regex101.com, you can turn on a flag by clicking on the slash symbol (/) right inside the pattern input: 

You can then select any flag you want to use:

N.B.: If the flavor of regex you selected in regex101.com is not ECMAScript, the set of flags presented to you might be different.
If you are using regexpal.com, click on "flags" above the regex pattern input:

Select any flag you want by clicking on it:

Now, let's take a detailed look at each of the regex flags and how they work in a regex engine.
The global Flag
The global flag is denoted by the letter g. With it, you get to perform a global match with your pattern. 
Remember in the previous chapter of this book, some patterns I defined stopped when they found the first match, even if there were more. That’s because by default, regular expressions only find the first match in a text. But with the g flag, all occurrences of the match are returned.
Another good thing about using the g flag is that you can iterate over the matches you get with the pattern in JavaScript. The iteration continues until there’s nothing to match. You will learn about multiple ways you can iterate over matches soon.
To let you see how the g flag works, I’ll use the hatch and freeCodeCamp examples from the previous chapter.
If you want to match the letters h in the word hatch with the pattern h, both the first and the last hs will be returned as matches as long as you have the g flag on:

And if you want to match e in freeCodeCamp with the pattern e and you turn on the g flag, the second and third es are returned as a match too:

The case-insensitive Flag
The case insensitive flag is denoted by i. As the name implies, it lets you perform case-insensitive matching.
By default, regular expressions perform case-sensitive matching. But with the i flag you can perform case-insensitive matching, so you won’t bother about casing in your patterns.
With this, uppercase or lowercase will be ignored. That means Hello and hello will be treated as the same thing:

freeCodeCamp and freecodecamp are treated the same, too:

RegEx and regex are also the same thing:

Another thing is that if you’re using a character class, for example [a-z], it would match uppercase letters too if you turn on the case-insensitive flag.
So, the pattern [a-z] also matches uppercase letters with the case-insensitive flag turned on:

The multi-line and single-line Flags
Denoted by m, the multi-line flag tells the regular expressions engine that the test string is more than one line. Since the multi-line flag influences the behavior of the start and end anchor metacharacters (^ and $), you’ll learn more about it under the anchors and word boundaries chapter.
The single-line flag is denoted by s. Just like the multi-line flag, the single-line flag also works with a metacharacter called the wildcard (.).  You will see the single-line flag in action under the chapter for metacharacters.
The Unicode Flag
The Unicode flag enables full Unicode matching in the regular expressions engine that supports it. It is denoted by u.
By default, JavaScript and many other programming languages treat strings as a sequence of 16-bit code units. With the u flag, regex patterns can match against Unicode code points instead of code units. This allows handling characters like emojis, certain symbols, and characters from non-Latin scripts. So, when you set the flag, it modifies the behavior of certain escape sequences and metacharacters to work with regular expressions.
For example, the escape sequence \u{1F602} will match the literal character u{1F602} if you don’t turn on the u flag:

But if you turn on the u flag, the same pattern matches the face with tears emoji:

That is one way to match emojis and other Unicode characters. Take the Unicode of the emoji and put the hexadecimal in curly braces, then precede the two with \u.
For instance, the Unicode of growing heart is U+1F497, the pattern to match it would be \u{1F497}:

You will see more examples of how the flag works in the chapter on how to use regular expressions in JavaScript.
The sticky Flag
The sticky flag is denoted by y. It’s a feature of JavaScript regular expressions implemented in ECMAScript 6. The y flag limits matching to the current position in the string, which you can specify with the lastIndex property of the RegExp() constructor.
When you use the y flag, it uses the lastIndex property to determine where the next search will start. The pattern matches only if it occurs exactly at the lastIndex position or at the beginning of the string.
Unlike the global (g) flag, the y flag does not find all matches but stops after the first successful match.
In a regex engine like regex101.com, the y flag usually anchors to the start of the test string and stops there:

Since the y flag typically works with the lastIndex property of JavaScript regular expressions, we will look at more examples in the chapter on how to use regular expressions in JavaScript – specifically when we look at the sticky of the regular expressions constructor.
You can also combine multiple flags to write more complex syntax. For example, you can use the g flag with the i flag for global and case-insensitive matching:

Chapter 4: How to Use Regular Expressions in JavaScript
How to Create Regular Expressions in JavaScript
There are two ways you can create regular expressions in JavaScript. The first is with regex literal syntax and the second is with the RegExp() constructor.
To create a regular expression with the regex literal syntax, you have to enclose the pattern inside two forward slashes (/) like this:
/regex pattern/

If you want to use one or more flags, it has to be after the second slash:
/regex pattern/flag

Depending on your use case, you might have to assign the regex to a variable:
const regex = /regex pattern/flag

The flag could be any of the flags available in the JavaScript regular expressions engine.
If you want to create regular expressions with the RegExp() constructor, you have to use the new keyword, then put the pattern and the flag inside the RegExp() brackets. 
This is what the syntax looks like:
const regex = new RegExp("regex pattern", "flag");

Since RegExp() is a constructor, there are some methods and properties available in it with which you can work with regular expressions. Whether you create your pattern with the literal syntax // or the RegExp() constructor, the methods and properties are available for it.
Methods of the RegExp() Constructor
The methods of the RegExp() constructor are defined on the RegExp.prototype. You can quickly check the methods (and properties) by typing RegExp().__proto__ and hitting ENTER in your browser console. These methods include test(), exec(), and toString(). 
Apart from those three, some methods take regular expressions as a parameter. But it is better to discuss them under "string methods for working with regular expressions" because, at their core, they are string methods that take regular expressions as a parameter.
Let’s take a look at what test(), exec(), and toString() do.
The test() Method
The test() method tests for a match between a regular expression and the test string and returns a boolean as the result. If there's a match, it returns true, and if there's no match, it returns false.
In the example below, there's a match for the pattern /freeCodeCamp/:
const re = /freeCodeCamp/;
const testStr =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

console.log(re.test(testStr)); //true

But in the example below, there's no match for the pattern /fcc/, so the test() method returns false:
const re = /fcc/;
const testStr =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

console.log(re.test(testStr)); //false

Apart from testing random patterns against a string, the test() method can be useful in form validation. 
The exec() Method
The exec() method executes a search for a match in a test string and returns an array containing a piece of detailed information about the first match. If there's no match, it returns null.
That detailed information contains the first match, the index of the match, captured groups (if any), and the length.
Here's an example:
const re = /freeCodeCamp/;
const testStr =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

console.log(re.exec(testStr));

And here’s a screenshot of the result:

If you want to make the exec() method return all the matches, you can use the g flag on the pattern and then loop through with a while loop:
const re = /freeCodeCamp/g;
const testStr =
  "freeCodeCamp is a great place to start learning to code from scratch. freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

let match;

while ((match = re.exec(testStr)) !== null) {
  console.log(match[0]);
}

Here's what the result looks like in the console:

You can go further by accessing the index of the matches this way:
const re = /freeCodeCamp/g;
const testStr =
  "freeCodeCamp is a great place to start learning to code from scratch. freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

let match;

while ((match = re.exec(testStr)) !== null) {
  console.log(match[0]);

  //   Access the indices of the matches
  console.log(match.index);
}


If there's no match, exec() returns null:
const re = /fcc/;
const testStr =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp.";

console.log(re.exec(testStr)); //null

The toString() Method
The toString() method converts a regex pattern to a string. In JavaScript, the toString() method is in every object. Regular expressions are treated as an object behind the scenes, that's why you can create them with the new keyword.
Using this method on a regex pattern converts the pattern to a string:
const pattern = /freeCodeCamp/;
const strPattern = pattern.toString();

console.log(strPattern, typeof strPattern); // /freeCodeCamp/ string

Even if you create the pattern with the RegExp() constructor, you get the result the same way:
const pattern = new RegExp('freeCodeCamp');
const strPattern = pattern.toString();

console.log(strPattern, typeof strPattern); // /freeCodeCamp/ string

And if you have a flag in the pattern, it would be returned as a part of the string:
const pattern = /freeCodeCamp/gi;
const strPattern = pattern.toString();

console.log(strPattern, typeof strPattern); // /freeCodeCamp/gi string

Properties of the RegExp() Constructor
The properties of the RegExp() constructor are defined on the RegExp.prototype. They include:

RegExp.prototype.global
RegExp.prototype.source
RegExp.prototype.flags
RegExp.prototype.multiline
RegExp.prototype.ignoreCase
RegExp.prototype.dotAll
RegExp.prototype.sticky
RegExp.prototype.unicode

In short, there are the global, source, flags, multiline, ignoreCase, dotAll, sticky, and unicode.
Most of the properties check whether a certain flag is used or not. Let's take a look at how each of the properties works. 
The global Property
The global property checks whether the g flag is used with a regex pattern or not. If the pattern has the g flag, it returns true, otherwise it returns false. 
Remember the global (g) flag indicates that the regex pattern should not just return the first match but all the matches.
Here's how the global property works in code:
const re1 = /freeCodeCamp/g;
const re2 = /freeCodeCamp/;
const re3 = new RegExp('freeCodeCamp');
const re4 = new RegExp('freeCodeCamp', 'g');

console.log(re1.global); //true
console.log(re2.global); //false
console.log(re3.global); //false
console.log(re4.global); //true

The flag Property
The flag property returns the flags you use in the regex pattern in alphabetical order. That is, g before i, i before m, m before y, and so on.
In the code below, you can see that the g flag comes before i, and m comes before y: 
const re1 = /freeCodeCamp/gi;
const re2 = new RegExp('freeCodeCamp', 'my');

console.log(re1.flags); //gi
console.log(re2.flags); //my

The source Property
The source property returns the regex pattern as a string. So, it acts like the toString() method.
The difference between the source property and the toString() method is that the source property excludes the flag you use with the pattern. Also, the source property does not show the literal forward slashes you use for creating the regex.
In the code below, you can see the forward slashes don’t get printed, the flags are omitted too, and the type is a string:
const re1 = /freeCodeCamp/gi;
const re2 = new RegExp('freeCodeCamp', 'my');

const re1Source = re1.source;
const re2Source = re2.source;

console.log(re1Source, typeof re1Source); // freeCodeCamp string
console.log(re2Source, typeof re2Source); // freeCodeCamp string

The multiline Property
The multiline flag is another boolean property of the RegExp() constructor. It specifies whether the multiline flag is used with the pattern or not by returning true or false. 
Remember the multiline (m) flag indicates that the test string should be treated as a text that has more than one line.
Here's how the multiline property works in action:
const re1 = /freeCodeCamp/gi;
const re2 = new RegExp('freeCodeCamp', 'my');

const re1Source = re1.multiline;
const re2Source = re2.multiline;

console.log(re1Source); //false
console.log(re2Source); // true

The ignoreCase Property
The ignoreCase property specifies whether the case-insensitive flag (i) is used in the regex pattern. It returns true if you use the i flag and false if you don’t use it.
const re1 = /freeCodeCamp/i;
const re2 = /freeCodeCamp/;
const re3 = new RegExp('freeCodeCamp', 'i');
const re4 = new RegExp('freeCodeCamp');

console.log(re1.ignoreCase); //true
console.log(re2.ignoreCase); // false
console.log(re3.ignoreCase); // true
console.log(re4.ignoreCase); // false

The Unicode Property
The unicode property helps you check whether the Unicode (u) flag is used in the regex pattern or not. If it finds the u flag, it returns true, otherwise it returns false.
const re1 = /\u{1F1F3}\u{1F1EC}/u; //matches the Nigerian flag emoji
const re2 = /\u{1F1F3}\u{1F1EC}/;
const re3 = new RegExp('\u{1F1F3}\u{1F1EC}', 'u');
const re4 = new RegExp('\u{1F1F3}\u{1F1EC}');

console.log(re1.unicode); //true
console.log(re2.unicode); // false
console.log(re3.unicode); // true
console.log(re4.unicode); // false

The sticky Property
The sticky property indicates whether the sticky (y) flag is set in the regular expression or not. Even though that's what it does, it's still a bit tricky to understand because of the lastIndex property.
When the y flag is set, the regex engine in use will attempt to match the pattern starting at the exact position specified by the lastIndex property (without using the g flag). If a match is found, the lastIndex property is updated to the position immediately after the end of the match.
To help you understand that better, here's a code snippet with comments:
const re = /xyz/y;
const str = 'xyzxyz';

re.lastIndex = 0;
console.log(re.test(str)); // true – there's a match at index 0 to 2
console.log(re.lastIndex); // 3

re.lastIndex = 1;
console.log(re.test(str)); // false – no match at the specified index
console.log(re.lastIndex); // 0 – resets to 0 because there's no match at the specified index

re.lastIndex = 3;
console.log(re.test(str)); // true – there's a match at index 3 to 5
console.log(re.lastIndex); // 6

re.lastIndex = 6;
console.log(re.test(str)); // false
console.log(re.lastIndex); // 0 – resets to 0 because there's no match at the specified index

N.B.: The dotAll property works with the wildcard (.) metacharacter. Due to that, you will see how it works in detail in the chapter on metacharacters. Also, hasIndices works with captures. So, you will see how to use it under the chapter on grouping and capturing.
String Methods for Working with Regular Expressions
JavaScript provides some inbuilt methods for working with strings. Some of these methods take regular expressions as a parameter. These methods include match(), matchAll(), replace(), replaceAll(), split(), and search().
Let's look at each of them one by one.
The search() Method
The search() method searches for the match of a regular expression in a string and returns the index of the match. 
const myStr =
  "fCC is the abbreviation for freeCodeCamp. freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /freeCodeCamp/;
const searchFCC = myStr.search(re);

console.log(searchFCC); //28

If the search() method finds no match, it returns -1:
const myStr =
  "fCC is the abbreviation for freeCodeCamp. freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /FCC/;
const searchFCC = myStr.search(re);

console.log(searchFCC); //-1

You might be thinking using the g flag with the pattern would return the indices of all the matches, but this isn't the case. The g flag does not affect the search() method:
const myStr =
  "fCC is the abbreviation for freeCodeCamp. freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /freeCodeCamp/g; //pattern with g flag
const searchFCC = myStr.search(re);

console.log(searchFCC); //28

If you want to get the indices of all the matches, you should use the match() or matchAll() method.
The match() Method
The match() method lets you specify a regex pattern as the parameter, then it runs through the string you use it against and returns an array containing the substring(s) that match the regex pattern.
const my_str = 'freeCodeCamp';
match = my_str.match(/free/);

console.log(match); // [ 'free', index: 0, input: 'freeCodeCamp', groups: undefined ]

You can also separate the regex pattern into a separate variable:
const my_str = 'freeCodeCamp';
const re = /free/;
const match = my_str.match(re);

console.log(match); // [ 'free', index: 0, input: 'freeCodeCamp', groups: undefined ]

If match() finds multiple matches, it returns all of them in the array, provided you use the g flag in the pattern: 
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;
const match = my_str.match(re);

console.log(match); // ['free', 'free', 'free']

If you expand the array, this is what it looks like:

Since the result is an array, you should probably use console.table() instead of console.log() so you can see the indices of the matches:
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;
const match = my_str.match(re);

console.table(match);


If the match() method finds no match, it returns null:
const my_str = 'freeCodeCamp';
const re = /ref/;
const match = my_str.match(re);

console.log(match); // null

The matchAll() Method
matchAll() is a hybrid of the match() method. It returns an iterator of all the substrings that match the regular expressions you provide. This means you have to use it with the global (g) flag.
Because it returns the iterator of all matches, matchAll() is a great option for looping through the matches of regular expressions. 
An alternative to iterating through the matches of a regular expression is using the exec() method and g flag, then looping with a while loop this way:
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;

let match;
while ((match = re.exec(my_str))) {
  console.log(match[0]); //
}

// free
// free
// free

With the matchAll() method, you don’t need the exec() and while loop. All you need is a for…of loop to get the matches:
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;
const matches = my_str.matchAll(re);

console.log(matches); // RegExpStringIterator {}

//loop through the matches with a for...of loop
for (const match of matches) {
  console.log(match);
}

This returns each match, their index, the test string, the length, and groups in their respective arrays:

You can modify the console log to get only the matches and their index this way:
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;
const matches = my_str.matchAll(re);

console.log(matches); // RegExpStringIterator {}

//loop through the matches with a for...of loop
for (const match of matches) {
  console.log(`Found a match ${match[0]} at index ${match.index}`);
}

/*
Output:
Found a match free at index 0
Found a match free at index 66
Found a match free at index 98
*/

You can also use the Array.from() method to do the same thing:
const my_str =
  "freeCodeCamp doesn't charge you any money, that's why it's called freeCodeCamp. Learn to code for free today.";
const re = /free/g;

Array.from(my_str.matchAll(re), (match) =>
  console.log(`Found a match ${match[0]} at index ${match.index}`)
);

/*
Output:
Found a match free at index 0
Found a match free at index 66
Found a match free at index 98
*/

If the matchAll() method finds no match, it returns an empty iterator. And if you decide to loop through that empty iterator, there'll be nothing to see in the console.
The replace() Method
The replace() method does what its name implies. It searches for matches of a specified string or regular expression in a string and replaces them with the specified replacement string. It returns a new string with the replacements applied.
The replace() method is not as straightforward as match() and matchAll() because it accepts two parameters – a regular expression and the replacement string. Any substring of the test string that matches the regular expressions is then replaced with the replacement string.
If the regular expression does not include the global (g) flag, only the first match is replaced:
const myStr =
  'Elephants are very large animals. They are large to the extent that they can uproot a large tree.';
const re = /large/;
const replaceLarge = myStr.replace(re, 'massive');

console.log(replaceLarge); // Elephants are very massive animals. They are large to the extent that they can uproot a large tree.

If you use the g flag in the pattern, all the matches are replaced:
const myStr =
  'Elephants are very large animals. They are large to the extent that they can uproot a large tree.';
const re = /large/g;
const replaceLarge = myStr.replace(re, 'massive');

console.log(replaceLarge); // Elephants are very massive animals. They are massive to the extent that they can uproot a massive tree.

The replaceAll() Method
The replaceAll() method is relatively new because it became available in ECMAScript 2021. It is a hybrid of replace().
Both replace() and replaceAll() do the same thing by taking a regular expression and a replacement string as parameters, and replacing all matches with the specified replacement string. 
But unlike replace() which will only replace the first match if you don’t use the g flag, replaceAll() replaces all the matches by default:
const myStr =
  'Elephants are very large animals. They are large to the extent that they can uproot a large tree.';
const re = /large/g;
const replaceLarge = myStr.replaceAll(re, 'massive');

console.log(replaceLarge); // Elephants are very massive animals. They are massive to the extent that they can uproot a massive tree.

If you don’t use the g flag with replaceAll(), it throws a TypeError:
const myStr =
  'Elephants are very large animals. They are large to the extent that they can uproot a large tree.';
const re = /large/;
const replaceLarge = myStr.replaceAll(re, 'massive');

console.log(replaceLarge); // Uncaught TypeError: String.prototype.replaceAll called with a non-global RegExp argument
//    at String.replaceAll ()

The split() Method
The split() method takes a string or regex and splits the string you use it against into an array based on the string or regex you pass into it. The split() method also takes an optional limit parameter, a positive number. When you specify the limit, the splitting stops at that limit.
Wherever the split() finds a match, it creates a new item in the array. Here's how it works:
const myStr = "Codes don't lie. You're the one doing something wrong.";
const re = /\s/; // "\s" means white space - spacebar, backspace, tab, ENTER.

const splitedStr = myStr.split(re);
console.log(splitedStr);

/* 
Output:
[
  'Codes',  "don't",
  'lie.',   "You're",
  'the',    'one',
  'doing',  'something',
  'wrong.'
]
*/

Here's how to use the split() method with the limit parameter:
const myStr = "Codes don't lie. You're the one doing something wrong.";
const re = /\s/; // "\s" means white space - spacebar, backspace, tab, ENTER.

const splitedStr = myStr.split(re, 5); // 5 is the limit here
console.log(splitedStr);

/*
output: [ 'Codes', "don't", 'lie.', "You're", 'the' ]
*/

How to Match Literal Characters in JavaScript Regular Expressions
As I pointed out earlier, literal characters are texts or strings you will write patterns for as they are.
If you want to match the text hello, /hello/ should be your pattern. You can then use the i flag with it to match both hello and Hello:
const testString = 'hello';
const re = /hello/;
const re2 = /hello/i;

console.log(re.test(testString)); // true
console.log(re2.test(testString)); // true

If you want to match freeCodeCamp, the pattern should be just that. You can also create a pattern that matches freeCodeCamp in any case:
const testString = 'freeCodeCamp';
const re = /freeCodeCamp/;
const re2 = /freeCodeCamp/i; // match freeCodeCamp in any case

console.log(re.test(testString)); // true
console.log(re2.test(testString)); // true

You can also match digits using literal characters:
const num = 10234;
const re = /2/;

console.log(re.test(num)); //true

How to Use Character Sets in JavaScript Regular Expressions
As a reminder, a character set is a group of characters enclosed in square brackets. They provide a way to specify a set of characters from which the regex engine can match a single character at a specific position in a test string. 
Character sets allow you to specify a range of characters, individual characters, or a combination of both.
Here are common examples of popular character sets in regular expressions:

[abc]: matches either a, b, or c
[aeiou]: matches any vowel character
[a-z]: matches any lowercase letter from a to z
[A-Z]: matches any uppercase letter from A to Z
[0-9]: matches any digit from 0 to 9

Let's look at how to match each of the above character sets in JavaScript regular expressions:
// uppercase character set
const hcaseRe = /[A-Z]/;
const hcaseStr = 'freeCodeCamp is cool';

console.log(hcaseRe.test(hcaseStr)); //true

// vowels character set
const vowelsRe = /[aeiou]/;
const vowelsStr = 'Imagine how pronunciation would have been without vowels';

console.log(vowelsRe.test(vowelsStr)); //true

// [abc] character set
const abcSetRe = /[abc]/;
const abcSetStr = 'freeCodeCamp is totally free';

console.log(abcSetRe.test(abcSetStr)); //true

// number character set
const numRe = /[0-9]/;
const numStr = 'Thank God for Arabic numerals 0 to 9.';

console.log(numRe.test(numStr)); //true

Chapter 5: Metacharacters, Quantifiers, Repeated Matches, and Optional Matches
What are Metacharacters?
In regular expressions, metacharacters are characters that have special meanings beyond their literal meaning. 
Metacharacters are the backbone of regular expressions. They serve as the building blocks for constructing better regex patterns and defining the behavior of the regular expression engine you're using, but with an extra learning curve.
This part of the book is where you will learn about topics such as:

Anchors
Word boundaries
How to specify character ranges
How to match every occurrence with the wildcard
Alternation
Greediness and laziness of regular expressions and how to prevent greediness

And lots more.
If you want to match any metacharacter as a literal character, you have to escape it with a backslash (\). And if there's a metacharacter represented by a word, you have to escape it with the backslash too. So, the backslash is also a separate metacharacter.
There's a metacharacter to negate most metacharacters. For instance, \b and \s represent the word boundary and space metacharacters. If you want to negate them, you can use \B and \S respectively. That's the pattern most metacharacters follow – the small letter is the metacharacter and the capital letter negates it.
Metacharacters are categorized into single and double metacharacters. As the name implies, single metacharacters have a "single" character and double metacharacters have a "double" character. 
Most metacharacters are also called shorthand character classes. As we look at each metacharacter, you will see whether it is a single or double metacharacter.
The Word and Non-word Metacharacters
Represented by \w, the word metacharacter is a shorthand character class that matches all word characters. Word characters are alphanumeric characters and underscores. So, they are a-z, A-Z, 0-9, and underscore (_).
Here's what happens when you use \w in a regex tester:

And here’s how it works in JavaScript:
const testStr =
  'Every alphanumeric character (a to z and 0 to 9) and underscore (_) is a word character';
const wordCharacterRe = /\w/g;

console.log(testStr.match(wordCharacterRe));

Since word characters are alphanumeric characters and underscores, you can simulate the \w metacharacter by putting all the examples in a character set:
const testStr =
  'Every alphanumeric character (a to z and 0 to 9) and underscore (_) is a word character';
const wordCharacterRe = /[a-z A-Z 0-9_]/g;

console.log(testStr.match(wordCharacterRe));

The non-word metacharacter is the opposite of the word metacharacter and it is represented by an escaped capital letter W (\W). 
The non-word metacharacter matches every other character apart from alphanumeric characters and the underscore. That includes spaces, punctuation marks, and symbols:

Here it is in action in some JavaScript code:
const testStr =
  'Every character apart from alphanumeric characters (a to z and 0 to 9) and underscore (_) is a non-word character';
const nonWordCharacterRe = /\W/g;

console.log(testStr.match(nonWordCharacterRe));

Since you can represent the word metacharacter by putting all the characters in a character set, you may be wondering how you can do the same for the non-word metacharacter. 
That's where the negated character set comes in. The caret (^) is used for negation. It is one of the two anchor metacharacters, which we'll look at next.
The Anchor Metacharacters
Caret (^) and dollar sign ($) are the two anchor metacharacters. They are both single metacharacters.
The caret anchors the regex pattern to the start of a line or string, so you can call it a "start of line anchor". 
For example, if you want to match the text "freeCodeCamp" and you want to make sure it's at the start of the line or a string, you can use the caret this way:

If the freeCodeCamp text is not at the start of the line, there won't be a match:

Here are the two cases in JavaScript code:
const testStr =
  "freeCodeCamp doesn't charge you any money. That's why it's called freeCodeCamp because. Learn to code for free today."; // has "freeCodeCamp" at the start of the line

const testStr2 =
  "It's called freeCodeCamp because freeCodeCamp doesn't charge you any money. Learn to code for free today."; // does not have "freeCodeCamp" at the start of the line

const startAnchorRe = /^freeCodeCamp/;

console.log(startAnchorRe.test(testStr)); //true
console.log(startAnchorRe.test(testStr2)); //false

The dollar sign metacharacter is the opposite of the caret. It anchors the regex pattern to the end of the line or string. So, there will only be a match if the target text is at the end of the line.
To use the $ metacharacter, it has to be the last character in your pattern:

If the target string has more than one line and the target text is at the end of each line, the last one matches:

To correct this behavior, you have to use both the g and m flags:

Here are all the cases in JavaScript code:
const testStr =
  "The lion is not the king of the jungle because of its strength, the lion is the king of the jungle because it's never intimidated";

const testStr2 = `The lion is not the king of the jungle because of its strength, the lion is the king of the jungle because it's never intimidated

This is another line that ends with intimidated

And this is the last line that ends with intimidated

And this is the last line that ends with intimidated`;

const re = /intimidated$/;
const re2 = /intimidated$/gm;

console.log(re.test(testStr)); // true
console.log(re.test(testStr2)); // true
console.log(re2.test(testStr2)); // true

If the target text is not at the end of the line, there won't be any match:
const testStr =
  "A lion can never be intimidated because it's the king of the jungle";
const re = /intimidated$/;

console.log(re.test(testStr)); // false

When you use both the dollar and caret metacharacters with the g and m flags, they don’t just match at the start and end of a line, they find the matches at the start and end of each line:
//dollar with g and m flags
const testStr1 = `The lion is not the king of the jungle because of its strength, the lion is the king of the jungle because it's never intimidated

Another line with intimidated

And another line with intimidated`;

const re1 = /intimidated$/gm;
const matches1 = testStr1.match(re1);

console.log(matches1); // [ 'intimidated', 'intimidated', 'intimidated' ]

// caret with g and m flags
const testStr = `freeCodeCamp doesn't charge you any money. That's why it's called freeCodeCamp because. Learn to code for free today.

freeCodeCamp starts this line

freeCodeCamp starts this line too
`;

const re2 = /^freeCodeCamp/gm;
const matches2 = testStr.match(re2);

console.log(matches2); // [ 'freeCodeCamp', 'freeCodeCamp', 'freeCodeCamp' ]

As I pointed out earlier, the caret metacharacter is typically used for negating a character set or any other character. With that, you tell the regex engine in use not to match that character or each of the character sets.
For example, if you have the pattern [^a], then all letters "a" in the test string won't be returned as matches:

If you have the pattern [^aeiou], all the vowels in the test string won't be returned as matches:

If you have the pattern [^a-zA-Z0-9_], that's equivalent to the non-word metacharacter (\W):

The Digit and Non-digit Metacharacters
The digit metacharacter is represented by \d. You can negate it with \D, so \D is the non-digit metacharacter.
\d matches all numbers (0 to 9), so it is a shorthand character class for [0-9]. So, if you have a string and you want to extract the numbers from it, you can use the \d metacharacter. But you have to use it with the g flag so it matches every number in the test string:

You can use the match() method to extract the numbers in JavaScript too:
const testStr =
  'Arabic numerals are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. From those ten numbers, you can write any number you want, including nonillion and decillion.';

const re = /\d/g;

console.log(testStr.match(re));

/* output
[
  '0', '1', '2', '3',
  '4', '5', '6', '7',
  '8', '9'
]
*/

A more straightforward example is matching dates since dates are mostly in numbers. For example, if you want to match a date in the format dd/mm/yyyy, you can match it with the pattern /\d\d\/\d\d\/\d\d\d\d/:
const date = '22/04/2023';
const re = /\d\d\/\d\d\/\d\d\d\d/;

console.log(re.test(date)); // true

Since you can also have a period or hyphen as the separator of a date, you can account for those too by putting all the possible separators in a character set:
const slashSeparatedSate = '22/04/2023';
const hyphenSeparatedDate = '22-04-2023';
const periodSeparatedDate = '22.04.2023';

const re = /\d\d[/.-]\d\d[/.-]\d\d\d\d/;

console.log(re.test(slashSeparatedSate)); // true
console.log(re.test(hyphenSeparatedDate)); // true
console.log(re.test(periodSeparatedDate)); // true

N.B.: The pattern above matches a date but also an invalid date like 99/45/2022. A better way to match dates is provided in the applications of the regex chapter.
Another example is matching phone numbers. For example, US phone numbers are in the format (123) 456-7890. You can use the pattern /\(\d\d\d\) \d\d\d-\d\d\d\d/:
const USPhone = '(123) 456-7890';
const re = /\(\d\d\d\) \d\d\d-\d\d\d\d/;

console.log(re.test(USPhone)); // true

The non-digit metacharacter is the opposite of the digit metacharacter. It matches all non-digit characters. That is, alphabets, spaces, and symbols. In other words, it is the shorthand character class for [^0-9].
If you want to extract all non-digit characters in a string, you can use the \D metacharacter:

This is it in JavaScript code:
const testStr =
  'Arabic numerals are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. From those ten numbers, you can write any number you want, including nonillion and decillion.';

const re = /\D/g;

console.log(testStr.match(re));

/* output
A total of 137 matches is too much to show here, but you can test it out yourself.
*/

The Square Brackets Metacharacter
You've already seen the square brackets ([]) metacharacter in action. Square brackets are used for specifying a character class, or character set. And if you want to match them as a literal character, then you have to escape them.
One thing to have in mind is that some metacharacters lose their meanings inside the character set. The exceptions to this are:

The caret (^) which you can use to negate a character set
The hyphen (-) which you can use to specify ranges

N.B.: Sometimes, you might encounter a situation where you have to escape some metacharacters inside a character set.
If you want to match any of those characters in a character set, you have to escape it. If you are just passing the three of those characters in directly, you don't need to escape them if the caret is not the first character.
const testStr =
  'If you want to match the caret (^), hyphen and (-) symbols in a character set, you might not have to escape them.';

const re = /[-^]/g;

console.log(testStr.match(re)); // [ '^', '-' ]

But if the caret is the first character in the character set alongside some word and non-word character, you should escape it, otherwise it will negate all other characters:

The Word Boundary and Non-word Boundary Metacharacters
The word boundary metacharacter is represented by \b and the non-word boundary metacharacter is represented by \B. Both let you match a specific part of a string where a word character and a non-word character exist.
Word boundary (\b) matches a position between a word character (\w) and a non-word character (\W), and vice versa. It can be useful when you want to match a certain word in a string, or if you want to make sure a particular word or character is in a string.
Here's an example in a regex tester:

And the same example in JavaScript code:
const myStr =
  'A Tiger can do everything a lion does, apart from being a family man.';
const re = /\blion\b/;

console.log(myStr.match(re));

/*
Output:
[
  'lion',
  index: 28,
  input: 'A Tiger can do everything a lion does, apart from being a family man.',
  groups: undefined
]
*/

If you use a g flag with the pattern and use the match() method, all the matches will be returned – as expected:
const myStr =
  'A Tiger can do everything a lion does, apart from being a family man. Not even a tiger can intimidate a lion within his family.';
const re = /\blion\b/g;

console.log(myStr.match(re)); // [ 'lion', 'lion' ]

On the other hand, the non-word boundary (\B) is the opposite of the word boundary (\b). So, it matches everywhere a word boundary won't return a match. For example, "thin" in "everything":

And also "code" in "freeCodeCamp" when you use the case insensitive (i) flag:

You can see that the first "code" in the text wasn't the match returned. That's the power of word and non-word boundary metacharacters.
Here's what the two reveal in JavaScript code:
const myStr1 =
  'A Tiger can do everything a lion does, apart from being a family man.';
const myStr2 = 'Learn to code for free on freeCodeCamp.';

const re1 = /\Bthin\B/;
const re2 = /\Bcode\B/i;

console.log(myStr1.match(re1));
console.log(myStr2.match(re2));

/*
Output:
[
  'thin',
  index: 20,
  input: 'A Tiger can do everything a lion does, apart from being a family man.',
  groups: undefined
]
[
  'Code',
  index: 30,
  input: 'Learn to code for free on freeCodeCamp.',
  groups: undefined
]
*/

The Parenthesis Metacharacter
The parenthesis metacharacters (( and )) let you create grouping and capturing. With them, you can treat any group of characters as a single unit and apply a common modifier or quantifier to them. 
Parenthesis is also used for creating both lookahead and lookbehind assertions.
When you create the group and assertions, you can reference them later in the same pattern with a backslash and the order in which they appear. For example, you can reference the first group by specifying \1 in the pattern.
In this book, a whole chapter is dedicated to grouping and capturing. There, you will learn more about grouping and capturing so you can see the parenthesis metacharacters in action.
The Space and Non-space Metacharacters
It is impossible for text to make sense without spaces. Not just a "space", but also other space characters like tabs, carriage returns, and new lines. This is why the space and non-space metacharacters are made available in regular expressions.
The space metacharacter is represented by \s and the non-space metacharacter is represented by \S.
\s matches all space characters:

And \S matches all non-space metacharacters:

Here's how both the \s and \S metacharacters work in JavaScript code:
const myStr = 'Learn to code for free on freeCodeCamp';
const spaceRe = /\s/g;
const nonSpaceRe = /\S/g;

console.log(myStr.match(spaceRe)); // [' ', ' ', ' ', ' ', ' ', ' '];

console.log(myStr.match(nonSpaceRe)); 
// [
// 'L', 'e', 'a', 'r', 'n', 't',
// 'o', 'c', 'o', 'd', 'e', 'f',
// 'o', 'r', 'f', 'r', 'e', 'e',
// 'o', 'n', 'f', 'r', 'e', 'e',
// 'C', 'o', 'd', 'e', 'C', 'a',
// 'm', 'p'
// ]

One cool thing you can do with \s in JavaScript is to replace all spaces with say, a hyphen, or any other thing you want:
const myStr = 'Learn to code for free on freeCodeCamp';
const replaceHyphen = myStr.replace(spaceRe, '-');

console.log(replaceHyphen); // Learn-to-code-for-free-on-freeCodeCamp

The space metacharacter does not just match the spacebar you press on the keyboard of your device. It also matches:

A tab character
A carriage return character
A new line character
A vertical tab character
And a form feed character

Here's an example:

You can't see the match for the carriage return but it's there:

If you want to match each of those space characters, they also have their unique metacharacters:

\t for tab
\r for carriage return
\n for new line
\v for vertical tab
\f for form feed.

You should be aware that most of the time, \s is all you need because it can do the matching for any space character.
The Pipe Metacharacter
Also known as the OR operator, the pipe metacharacter is represented by the pipe symbol (|). It lets you specify multiple alternatives for matching. 
The pipe matches the character preceding it, or the character that follows it. For example, if you have website|web\sapp as your pattern, then one or both of website and web app will be returned as the match:

The evaluation goes from left to right. If a match is found on the left, it returns the match. And if there's no match on the left, the character on the right-hand side is evaluated for a possible match. If both characters on the left and right are in the test string, then both are returned as matches.
You can also have more than two characters separated by the pipe symbols. For instance, the pattern /o|a|i|re/ would match o, a, i, and re:

There's no limit to the characters you can separate with it.
You can see I used the g flag in those examples. If you don’t use the g flag and both the left and right characters are matches, only the first match in the test string will be returned:

Here's a clearer example:

Here's how using the OR operator works with the g flag in code:
const myStr = 'The website and web app are running fine';
const re = /website|web\sapp/g;

console.log(myStr.match(re)); // returns [ 'website', 'web app' ] because of the g flag

And here's how it works without the g flag:
const myStr = 'The website and web app are running fine';
const re = /website|web\sapp/;

const matches = myStr.match(re);

for (const match of matches) {
  console.log(match); // returns "website" and ignores web app because there's no g flag
}

How to Match Repeated Characters With Quantifiers
Repeated characters occur when the same character exists in multiple numbers consecutively.
When you have a repeated character in your test string, you don't need to repeat a particular character in your pattern to match it. That's because there are metacharacters available for one or more matches, zero or more matches, and zero or one matches, AKA optional matches.
One or More Matches with the Addition Sign Metacharacter
As you can guess, the addition sign metacharacter is represented with a plus (+). You can also call it the "one or more quantifier".
If you want a particular character to be repeated one or many times, that's what the addition sign metacharacter does. 
For example, the pattern, /fe+d/ will match any word with one letter e or multiple letters e that occur consecutively. For instance, fed and feed:

A practical example in JavaScript is extracting vowels in a test string while limiting occurrences by making sure multiple vowels that follow one another are also returned:
const myStr = 'You should plant trees to save mother earth';
const re = /[aeiou]+/gi;

console.log(myStr.match(re));

/*
Output:
[
  'ou', 'ou', 'a',
  'ee', 'o',  'a',
  'e',  'o',  'e',
  'ea'
]
*/

You can also append the addition sign metacharacter to other metacharacters. For example, /\d+/ would match one or more digits:

You can also add the + metacharacter to a character set to repeat it one or more times. In the screenshot below, the pattern /f[a-z]+/ would match one or more letter f followed by any set of small letters:

Zero or More Matches with the Asterisk Metacharacter
The asterisk metacharacter (*) matches zero or many occurrences of the character it comes after. You can also call it a "zero or more quantifier". 
So, if you want a character to be repeated zero or more than one time, you can use the asterisk metacharacter. A basic example is using the pattern /go*d/ would match any word that starts with the letter g followed by any number of the letter o, and ending with the letter d:

Just like you can do with the plus metacharacter, you can also append the asterisk metacharacter to any other metacharacter. For example, you can match empty strings with the pattern /\s*/:

Doubting that? Here it is in JavaScript code:
const re = /\s*/;
const emptyString = '';

console.log(re.test(emptyString)); // true

I didn’t know matching empty strings was as straightforward as this until I got to this point in the book!
Again, like the plus metacharacter, you can also add the * metacharacter to a character set to repeat it zero or more times:

Here's the same thing in JavaScript code:
const myStr = 'You can make yourself free from diseases';
const re = /f[a-z]*/g;

console.log(myStr.match(re)); // [ 'f', 'free', 'from' ]

You can see the f in the word yourself is even a match too. That's one way to deduce that the asterisk (*) returns more matches than the addition sign (+) metacharacter because it is greedier. You will learn about greediness of a regular expression in the closing part of this chapter.
Zero or One Matches with the Question Mark Metacharacter
The question mark metacharacter (?) is also known as the zero or one quantifier. It lets you make the character that precedes it optional, so it plays an important role in preventing greediness.
For example, the pattern /ab?c/ will match abc and ac, but never abbbc or any other numbers of b between the a and c:

This is not the case with the other two metacharacters for matching repeated characters (+ and *). The pattern /ab*c/ will match all of  abc, ac, abbbc, and abbbbbbbc while /ab+c/ will leave out ac:
const myStr = 'abc ac abbbc abbbbbbbc';
const re1 = /ab*c/g;
const re2 = /ab+c/g;
const re3 = /ab?c/g;

console.log(myStr.match(re1)); // [ 'abc', 'ac', 'abbbc', 'abbbbbbbc' ]
console.log(myStr.match(re2)); // [ 'abc', 'abbbc', 'abbbbbbbc' ]
console.log(myStr.match(re3)); // [ 'abc', 'ac' ]

A better example is tailoring a regex pattern to match words that have different spellings due to the small variations in British and American English. For example, color and colour:

There's also centre and center:

You can extract those words in JavaScript. You can't use the match() method for that because it causes some unexpected behaviors when used with the ? metacharacter.
Here's how I was able to do it for color and colour:
const myStr = 'The words center and centre are homophones';
const re = /cente?re?/g;

let match;
const matches = [];

while ((match = re.exec(myStr)) !== null) {
  matches.push(match[0]);
}

console.log(matches); // ["center", "centre"]

I used the same approach to extract center and centre:
const myStr =
  'It is "colour" in British English and "color" in American English';
const re = /colou?r/g;

let match;
const matches = [];

while ((match = re.exec(myStr)) !== null) {
  matches.push(match[0]);
}

console.log(matches); // [ 'colour', 'color' ]

Many times, it's challenging knowing which to use for character repetition between these three metacharacters – *, +, and ?. It can even be hard to get used to what each of them does if you're just starting out with regular expressions.
Be aware that identifying them and knowing which to use between them is not a herculean task. Here are some things to note about the three of them:

Asterisk (*) means "zero or many": use it if you want a character not to appear in the target string or you want the same character to be more than one
Plus (+) means "one or many": use it if you want a character to appear once or more than once in the target string
Question mark (?) means "zero or one": use it if you want a character to be optional in the target string.

How to Specify Match Quantity with the Curly Braces Metacharacter
Quantifiers let you indicate the quantity or frequency of a preceding character in a pattern with curly braces ({}}. With those braces, you can specify an exact quantifier, a minimum quantifier, and a range quantifier.
The Range Quantifier
The general syntax for the range quantifier looks like this:
char{n1,n2}


cha is any character you're applying the quantifier to
n1 is the minimum number of times you want the character to repeat
n2 is the maximum number of times you want the character to repeat

An example is the pattern /a{3,6}/. This means you want to match between three and six letters a:

If you have more than six letters a in the test string, the first six will match:

To fix this, you can surround the pattern in a word boundary:

You can also attach the range quantifier to metacharacters. For example, you can extract any number that is at least in hundreds this way:
const myStr =
  'The marathon had 500 participants, with 251 finishing under 3 hours, and the winner crossed the line at 4800 seconds.';
const re = /\b\d{3,6}\b/g;

console.log(myStr.match(re)); // [ '500', '251', '4800' ]

The Minimum Quantifier
The minimum quantifier lets you specify the minimum number of times you want the character that precedes it to match. You can do this by putting a comma right after the number in the curly brace. The general syntax looks like this: {n,}. 
For example, the pattern /a{3,}/ means you want a minimum of three letters a. In this case, one letter a and two letters a won't be a match, but three letters a and upward would be returned as matches:

Let's extract those matches with the match() method:
const myStr =
  '"a" won\'t match here. "aa" won\'t match too, but "aaa" is a match, "aaaa" is also a match, and every other number of "a"';
const re = /a{3,}/g;

console.log(myStr.match(re)); // [ 'aaa', 'aaaa' ]

The Exact Quantifier
The exact specifier is represented by {n}. In this case, n stands for the exact number of times you want that character to be repeated. For instance, the pattern, /a{3}/ means you want a to be repeated three times

Unfortunately, a match is returned anywhere there are three letters a that follow one another. You can prevent this behavior with word boundary (\b):

That way, you can extract the abbreviations, AAA from a string using the match() method. Below is an example: 
const myStr =
  "There is American automobile association (AAA)and there is Australian automobile association (AAA). What I've never seen is AAAA or AAAAAA.";
const re = /\ba{3}\b/gi;

console.log(myStr.match(re)); // [ 'AAA', 'AAA' ]

Remember the pattern I wrote to match dates in the dd/mm/yyyy format? You can make it better and easier to read with the exact quantifier like this:
\d{2}[/.-]\d{2}[/.-]\d{4}

Everything still works fine:
const slashSeparatedSate = '22/04/2023';
const hyphenSeparatedDate = '22-04-2023';
const periodSeparatedDate = '22.04.2023';

const re = /\d{2}[/.-]\d{2}[/.-]\d{4}/;

console.log(re.test(slashSeparatedSate)); // true
console.log(re.test(hyphenSeparatedDate)); // true
console.log(re.test(periodSeparatedDate)); // true

You can also make the pattern that matches the US phone number better and shorter with the same approach:
\(\d{3}\) \d{3}-\d{4}

Everything still works fine too:
const USPhone = '(123) 456-7890';
const re = /\(\d{3}\) \d{3}-\d{4}/;

console.log(re.test(USPhone)); // true

The Wildcard Metacharacter
The wildcard metacharacter is represented by a dot (.), so you can also call it the dot metacharacter. 
The wildcard lets you match any character apart from a new line (\n). That means you can use it to match alphanumeric characters, spaces, and symbols.

You can also attach the wildcard metacharacter to another metacharacter. For example, the pattern /\d./g should match at least a number and everything that follows it:

You can see that the pattern is transcending beyond the digits by matching the spaces after them. This is what is called greediness.
The pattern, /\d.*/g is even more greedy because it will match everything after it encounters the first number:

It’s the same in code:
const myStr =
  'An example of a two-digit number is 20. 100 is a three-digit number. 300 and 900 are also three-digit numbers.';
const re = /\d.*/g;

console.log(myStr.match(re)); // [ '20. 100 is a three-digit number. 300 and 900 are also three-digit numbers.']

If you want the wildcard to match a new line too, you can use the s flag. Here's an example:
let codeBlock = `
  function add(x, y) {
    /* This is a function
    that takes two numbers
    and adds them together. */
    return x + y;
  }
`;

let commentRegex = /\/\*(.*)\*\//s; // gets everything between /* and */

const match = codeBlock.match(commentRegex);
console.log(match);

Here's the result:

You can use the dotAll property to check if the s flag is used in the pattern:
let codeBlock = `
  function add(x, y) {
    /* This is a function
    that takes two numbers
    and adds them together. */
    return x + y;
  }
`;

let commentRegex = /\/\*(.*)\*\//s; // gets everything between /* and */
const match = codeBlock.match(commentRegex);

console.log(commentRegex.dotAll) // true;

You can extract the match with an if statement:
let codeBlock = `
  function add(x, y) {
    /* This is a function
    that takes two numbers
    and adds them together. */
    return x + y;
  }
`;

let commentRegex = /\/\*(.*)\*\//s; // gets everything between /* and */

const match = codeBlock.match(commentRegex);

if (match) {
  console.log(match[1]);
}

/*
Output:  
This is a function
    that takes two numbers
    and adds them together.
*/

Because the wildcard always matches any character it encounters apart from a new line, it is better not to use it unless it is absolutely necessary. For every character the wildcard matches, there is always another way to match it.
Greediness and Laziness in Regular Expressions
By default, regular expression patterns are greedy, meaning they always try to match as many as possible characters. But the concept of greediness is primarily applicable to quantifiers (*, +, ?, and {}) and the wildcard (.).
For Example, the pattern /f.*h/gi will match as many characters as possible after encountering an f in the target string:

Same for the pattern, /f.*h/gi:

It’s the same in code: 
const myStr = 'The fresh fish was caught in the Finnish lake';
const re = /f.*h/gi;

console.log(myStr.match(re)); // [ 'fresh fish was caught in the Finnish' ]

Laziness is the opposite of greediness and it’s the way you stop greediness. On many occasions, if you want to stop greediness, all you need is to apply the zero or ones quantifier (?) to the metacharacter causing the greediness.
Here's how I stopped the greediness of the asterisk metacharacter:

I stopped it for the plus metacharacter the same way:

I can now safely extract every word that starts with f and ends with h:
const myStr = 'The fresh fish was caught in the Finnish lake';
const re = /f.*?h/gi;

console.log(myStr.match(re)); // [ 'fresh', 'fish', 'Finnish' ]

Chapter 6: Grouping and Capturing in Regex
What is Grouping?
Grouping means treating a regex pattern or a part of a regex pattern as a single unit. To achieve grouping, you surround the pattern or the part of the pattern you want to group in parenthesis (( and )).
After you've grouped the part of the pattern you want to, you can then refer back to it through a process we call "backreferencing" in regular expressions.
The groups you define in a pattern refer to the target string or text and not the pattern itself. You'll see this in action when it's time to discuss backreferencing. 
After grouping, you can then apply a quantifier to that group since all the patterns in it are a unit.
Let's say you have a group of the ids z8g4g4 ga1v4g f4k7f9 bb3b2b d6b4t5 d4cm3d e9f5y6 ggj64 mgtyqg m0foh9 and you want to find out which of them follow the pattern letter number letter number letter number. The pattern [a-z]\d[a-z]\d[a-z]\d can do that for you:

Using grouping, you can make the pattern shorter by grouping the [a-z]\d sequence and applying an exact quantifier of 3 to it:
([a-z]\d){3}


When you use grouping in a pattern, especially if you have multiple groups in the same pattern, you can use the exec() method to extract each of the groups.
A good example to illustrate this is a date in any acceptable format, for example dd/mm/yyyy.
Here's how I group the pattern \d\d[/.-]\d\d[/.-]\d\d\d\d into dd, mm, and yyyy:
(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)

I used the exec() method this way:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/;
const date = '22-03-2023';

const execRes = re.exec(date);
console.log(execRes);

This is what the result looks like in the console:

In the array, you can see that:

there is the whole date in the index 0
the index 1 has the day
the index 2 has the month`
and the index 3 has the year

You can then use array referencing to get all of those figures:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/;
const date = '22-03-2023';

const execRes = re.exec(date);

console.log(`The full date is ${execRes[0]}`); // The full date is 22-03-2023
console.log(`The day is ${execRes[1]}`); // The day is 22
console.log(`The month is ${execRes[2]}`); // The month is 03
console.log(`The year is ${execRes[3]}`); // The year is 2023

You can also use this approach to extract a username and domain from an email:
function extractUsernameAndDomain(email) {
  const re = /([a-z]{2,})@([a-z]{3,}\.com)/;
  const result = re.exec(email);

  console.log(`The username is ${result[1]}`);
  console.log(`The domain is ${result[2]}`);
  console.log(`The full email is ${result[0]}`);
}

extractUsernameAndDomain('janedoe@gmail.com');

/*
Output:
The username is janedoe
The domain is gmail.com
The full email is janedoe@gmail.com
*/

This behavior of grouping in which each match of the pattern is separated in an array according to the groups is the reason groups are also called "capturing" groups. This way, you don’t need the split() method of JavaScript or any other programming hacks to get each of the groups on those dates.
How to Reference Captured Groups with Backreferences
Since groups are captured by default, you can refer back to them. To do this, you use a backslash (\) and then the order of the group in the pattern. For example, you can reference the first group with \1 and the third group with \3. No zero indentation.
Let's say you want to match "tsetse" fly in the text There are many tsetse flies in the tropics. If you group the text "tse" first and use the g flag, you'll get two matches:

You can refer back to that tse group with \1 and you'll have a single match: 

It's very important to note that when you use a capturing group, the grouping refers to the target string (or text) and not the pattern itself. The reason why the pattern /(tse)\1/ returns a match in the last example is because of the "tse" in the text and not the "tse" in the pattern.
To illustrate this, let's use a date again, since the month or date and the separators can repeat and can be different. I will use the pattern (\d\d)([/.-])\1\2(\d\d\d\d) for matching dates that I grouped in one of the previous examples. Remember the pattern successfully matches a date:

I can group the separator too and refer back to it for the second separator. I can also refer back to the day part of the date to match the month, since they both look for two digits. 
Here's the new pattern now:
(\d\d)([/.-])\1\2(\d\d\d\d)

I can make the pattern shorter with an exact quantifier:
(\d{2})([/.-])\1\2(\d{4})

The new pattern successfully matches the same date:

But the reason there's a match in the example above is that the separators are the same and the day and month are the same.
If the day is different from the month, there won't be a match:

If the separators are different too, there also won't be a match:

But remember that if both are the same, there will be a match:

That is the reason why the groups in a pattern refer to the target string (or text) and not the pattern itself. 
It is also possible to make a group non-capturing. That way, you won't be able to refer to it in the pattern. To create a non-capturing group, you use a question mark and a colon right after the opening parenthesis.
The syntax for that looks like this:
(?: chars)


Because of this, the text does not match the pattern anymore. To make it match again I have to:

remove the first backreference (\1)
define \d{2} for the date
change the reference to the separator from \2 to \1

Here’s the new pattern:
(?:\d{2})([/.-])\d{2}\1(\d{4})

And now the date matches the pattern:

How to Use the d Flag and hasIndices Property with Groups
The d flag adds index information to match objects for capture groups. This way, you won't just know what was matched by each capture group, but also where that match was found in the input string.
Let's look at how this works with the grouping for matching dates:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/d;
const date = '22-03-2023';

const match = re.exec(date);
console.log(match);

The result contains an array of objects detailing the total position of all matches, and the position of each match:

If you want to see those indices, you can use .indices to see them:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/d;
const date = '22-03-2023';

const match = re.exec(date);

console.log(match.indices);


You can also extract those indices separately:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/d;
const date = '22-03-2023';

const match = re.exec(date);

console.log(`The full index range is ${match.indices[0]}`); //The full index range is 0,10
console.log(`The day index range is ${match.indices[1]}`); // The day index range is 0,2
console.log(`The month index range is ${match.indices[2]}`); // The month index range is 3,5
console.log(`The year index range is ${match.indices[3]}`); // The year index range is 6,10

And finally, you can check if the d flag is really used with the hasIndices property:
const re = /(\d\d)[/.-](\d\d)[/.-](\d\d\d\d)/d;
const date = '22-03-2023';

console.log(re.hasIndices); // true

Chapter 7: Lookaround Groups: Lookaheads and Lookbehinds
What are Lookaround Groups?
Lookaround assertions are non-capturing groups that return matches only if the target string is followed or preceded by a particular character. 
Lookaround assertions do not consume the characters in the input string or text. This makes them a "zero-width assertion", and that's why lookaround groups are also called "lookahead assertions".
There are two types of lookaround groups: lookahead and lookbehind. The two also have their positive and negative forms, so there are positive lookahead, negative lookahead, positive lookbehind, and negative lookbehind groups.
What is a Lookahead Group?
A lookahead group is a non-capturing group that lets you match a part of a string only if it is followed by another character in the string, without including that string or text to match in the pattern.
A lookahead group is useful when you want to match a string based on a condition. So, look at it like an if statement in a programming language.
There are two types of lookaheads, namely positive lookahead and negative lookahead.
Because you're still dealing with groupings, a positive lookahead is specified by an opening parenthesis followed by a question mark, an equal sign, the characters, and a closing parenthesis:
(?=chars)

For example, the pattern x(?=y) means match x only if it is followed by y.
In the syntax of negative lookahead, you replace the equal sign with an exclamation mark:
(?!chars)

For example, the pattern x(?!y) means do not match x if it is followed by y.
Let's look at an example of a positive lookahead assertion.
Say you want to match the domain name of domains that have only the .org extension within a string of domains with other extensions. This pattern would do it:
[a-zA-Z]+(?=\.org)

In the pattern, [a-zA-Z]+ represents one or more word characters, and (?=\.org) checks whether the domain contains a .org extension.
In the screenshot below, you can see that domain names that have a .org extension were matched:

You also can see that the words "freeCodeCamp" and "catholic" were not included in the pattern, but they still matched the pattern because they have the .org extension.
If there are no domains with the .org extension in the target string, there won't be any match. That's true for the domains without the .org extension.
That way, you can extract text like that in JavaScript and do whatever you want with it:
const domains = 'koladechris.com freeCodeCamp.org mdn.com catholic.org';
const re = /[a-zA-Z]+(?=\.org)/g;

const charityWebsitesArr = domains.match(re);
const charityWebsites = charityWebsitesArr.join(',').replace(/,/, ' and ');

console.log(charityWebsites, 'are examples of charity organizations.'); //freeCodeCamp and catholic are examples of charity organizations.

If you want to match the .org as well so the whole domain gets matched, you have to include the .org in the pattern:

Since lookahead groups don’t consume characters, you will see a lot of developers use positive lookaheads to validate passwords.
Let's say you want the password to be at least six characters that includes a lowercase letter, an uppercase letter, a number, and a symbol. You can use lookaheads to define all of those conditions:

(?=.{6,}) ¬– at least 6 characters
(?=.*[a-z]) – at least one lowercase character, but check if there are zero or many characters before it
(?=.*[A-Z]) – at least one lowercase character, but check if there are zero or many characters before it
(?=.*[0-9]) – at least one number, but check if there are zero or many characters before it
(?=.*[!@#$%%^&*()+=-]) – accepted symbols, but check if there are zero or many characters before each
.* – check if there are zero or many characters after the groups

Here's the full regular expression:
(?=.{6,})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%%^&*()+=-]).*

And here is what matches the pattern and what does not:

To use that pattern in JavaScript, you can test it against a password string and do something from there:
const password = 'Tse23*';
const passwordRe =
  /(?=.{6,})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%%^&*()+=-]).*/;

if (passwordRe.test(password)) {
  console.log('Welcome to your dashboard!');
} else {
  throw new Error('Incorrect password!');
}

/**
output: Welcome to your dashboard!
*/

For the application of negative lookahead, it is useful when you don't want a certain character before the character(s) you are looking for in a string. 
Let's say you want to extract all items of an array that do not have the article "the" before them. In that case, you can use the pattern below:
/^(?!.*\bThe\b).*$/

In the pattern above:

^ ensures the regex pattern matches from the start of the line
(?!.*\bThe\b) is the negative lookbehind that ensures that the article "the" is not in the target string
\bThe\b is a word boundary that matches "The" and nothing else
.* the wildcard that matches any character apart from a new line

let docTitles = [
  'The Incredible Dr. Poll',
  'Born in Africa',
  "America's Funniest Home Videos",
  'The Lion Queen',
  'Snake in the City',
];
let re = /^(?!.*\bThe\b).*$/;

for (let title of docTitles) {
  if (re.test(title)) {
    console.log(`A Title without "The": ${title}`);
  }
}

/*
Output:
A Title without "The": Born in Africa
A Title without "The": America's funniest home videos
A Title without "The": Snake in the City
*/

What is a Lookbehind Group?
A lookbehind group is similar to lookahead group. But instead of checking if a certain character(s) follows what you're trying to match, it checks whether the character(s) precedes what you're trying to match.
So, a lookbehind group is a non-capturing group that lets you match a part of a string only if it is preceded by another character in the string, without including that string or text to match in the pattern.
Like lookaheads, there are also positive and negative lookbehind assertions. A positive lookbehind returns a match only if the character you want to match is preceded by another character you specify in your pattern. On the other hand, a negative lookbehind returns a match only if the character you want to match is not preceded by another character.
A positive lookbehind is represented by an opening parenthesis, a question mark, a less than symbol, an equals sign, the character(s), and a closing parenthesis:
(?<=chars)

For example, the pattern (?<=x)y indicates you want to match y only if there's x before it. In this case, xx or yx won't match, but xy would match.

For a negative lookbehind, an exclamation mark replaces the equals sign:
(?

For example, the pattern (? means do not match y if there's x before it. In this case by would match, my, would match, but never xy.


Positive lookbehind groups can be useful for matching numbers preceded only by a certain currency symbol, for example numbers preceded by the dollar sign.
The regex pattern below has a positive lookbehind that matches a number only if it is preceded by a dollar sign:
(?<=\$)\d+(\.\d*)?

In the pattern above, the lookbehind ((?<=\$)) checks whether there's a dollar sign before one or more digits (represented by \d+). The other group, (\.\d*), and the zero or one quantifier (?) check whether the number contains floating points.
Here's what matches and what does not:

In JavaScript, what you can do with the numbers that match is to calculate the total with the reduce() method:
const myStr =
  '10 pieces of the items cost $102.99, but you can get 15 for a discount of $2, and 20 for a discount of $3.99';

const re = /(?<=\$)\d+(\.\d*)?/g;

// put all the prices in an array
const allPrices = myStr.match(re); // [ '102.99', '2', '3.99' ]

// convert each of the prices to a number with map() and unary plus
const allPricesToNum = allPrices.map((price) => +price); // [ 102.99, 2, 3.99 ]

// add all the numbers with reduce()
const sumOfAllPrices = allPricesToNum.reduce((acc, curr) => acc + curr, 0); // 108.97999999999999

// add a dolar sign to the number and use toFixed() to round it down
console.log(`$${sumOfAllPrices.toFixed(2)}`); // $108.98

For the example of negative lookbehind, let's say you want to match a digit as long as it is not preceded by the dollar sign. This pattern does it:
(?

But unfortunately, it still looks out for a number inside another number and matches it even if there's a dollar sign before the whole number:

To correct that behavior, you can surround the whole pattern with a word boundary (\b):

Negative lookbehind groups are supported in JavaScript as well:
const monies = '$123 456 $789 £12 ₦568 $8903 £345';
const re = /\b(?;

console.log('Monies without dollar sign:', monies.match(re)); // Monies without dollar sign: [ '456', '12', '568', '345' ]

Chapter 8: Regex Best Practices and Troubleshooting
Best Practices to Consider While Writing Regular Expressions
Over time, regular expressions can become complex and hard to understand, depending on the use case and purpose. Things may become more complicated if it takes you a long time to come back to them or you work in a team.
Luckily, there are a few best practices to consider while writing regular expressions so you can make things easier for yourself and your team members.
Here are those best practices:

Keep it simple and readable: a simple, easy-to-read, and effective regex is better than a complex and effective regex. If you can make the regex efficient without using the complex concept of non-capturing groups like lookarounds (lookaheads and lookbehinds), then don’t use them.

Avoid greedy matches: metacharacters like * and + and the wildcard (.) are greedy by default. It's hard to do without them, but when you use them and they cause greediness, make sure you use the zero or one quantifier (?) on them. In addition, avoid using the wildcard where necessary.

Use comments to describe what a regex does: if you're working in a team, try to explain what the regexes you write do so others can understand them without wasting time.

Use online regex testers: instead of writing your regular expressions in your code editor, write them in regex testers where you can test what they match without writing some more code. Free online regex testers like regex101 and regexpal.com also play a role in debugging because they can highlight errors and tell you what's wrong. 

Escape special characters: if you want to perform a literal match on metacharacters like ., *, +, {, }, and others, don’t forget to escape them unless you're using them inside a character set. Sometimes, you even have to escape hyphens in a character set. 


How to Write Accurate Regular Expressions
Writing accurate regular expressions with precision requires understanding what you want to match, the pattern to use, attention to detail, and an understanding of the underlying syntax and behavior of regular expressions in general.
This is crucial in order to ensure there are no avoidable errors and make sure the regexes you write effectively match the desired string.
Here are some tips to help you write accurate regular expressions:

Understand the string you want to match: before you write the regex pattern to match a string, examine the string closely. Determine if you're targeting the whole string or a particular part of the string. If you're targeting a part of the string or you want to strip out some, look out for the pattern that you want follow. If you get familiar with the string, you can write a more accurate regex.

Be specific: avoid using the wildcard where necessary. For instance, do not use the wildcard to match a number since you can use \d or [0-9], or uppercase letters since you can use [A-Z].

Use quantifiers to shorten patterns: if you want a particular part of your regex to match repeated occurrences, try to use quantifiers like +, *, {n,m}, {n,}, and {n}. For instance, if you want to match a date with / as the separator, you can use the pattern \d{1,2}\/\d{1,2}\/\d{4} instead of \d\d\/\d\d\/\d\d\d\d.

Use online regex testers: online regex testers like regexpal.com and regex101.com help you write more accurate regexes by giving you a live match preview, highlighting matches, and showing you errors their engines encounter while processing the regexes.

Use word boundary to prevent unwanted matches: surrounding your pattern with the word boundary (\b) can help you prevent unnecessary and unwanted matches. For example, if you want to match a 6-digit zip code, \d{6} can do it for you but will also match any part of the string that has 6 digits that follow one another. What would do it better is \b\d{6}\b.


Anchors (^ and $) can also help prevent unwanted matches since they "anchor" a pattern to the start or end of the line. You can use them to make sure the match is found at the end or start of the line, or both. 
For example:

/^Hello/i would only match Hello or hello at the start of a line
/Hello$/i would only match Hello or hello at the end of a line
/^Hello$/i would only match Hello or hello if it’s the only target string unless you have the multiline flag turned on and there's Hello or hello on another line.

If you have issues getting things right with a regex pattern, online testing tools like regex101.com and regexpal.com can also help you step through the pattern bit by bit. There are also regex visualizers you can use to check what's wrong with your regex patterns. 
One of those tools that I find amazing is Regulex (jex.im/regulex). It helps you put your regular expressions in a visual perspective you can export

And it can show you what goes wrong with your pattern:

Chapter 9: Applications of Regular Expressions
A Better Way to Match Dates
You've seen several patterns you can use to match dates in the dd/mm/yyyy format such as \d\d\/\d\d\/\d\d\d\d, \d\d[/.-]\d\d[/.-]\d\d\d\d/;, and \d{1,2}\/\d{1,2}\/\d{4}.
The problem is that those three patterns just check for the occurrence of a number, and not a valid date. For example, invalid dates 99/89/2022 or 42/32/1909 would still match those patterns:

The solution is that you must account for the acceptable day, month, and year:

the day can be 1 or 2 digits
the day cannot exceed 31
the month cannot exceed 12
the year could be 2 or 4 digits, but never 1, 3, or greater than 4 digits

You should also account for:

a day that could start with 0, 1, 2, or 3, but never 4 or greater
a month that could start with 0, or 1, but never 2 or greater

Here's the regex pattern that satisfies those conditions:
/^(3[01]|[12][0-9]|0?[1-9])[-./](1[0-2]|0?[1-9])[-./](20[0-9]{2}|[0-9]{4}|[0-9]{2})$/gm

The image below is an illustration that labels each part of the pattern and explains what they do:

Here are the dates that match the pattern and those that do not:

You can take the pattern and test it against some dates in JavaScript:
const re =
  /^(3[01]|[12][0-9]|0?[1-9])[-./](1[0-2]|0?[1-9])[-./](20[0-9]{2}|[0-9]{4}|[0-9]{2})$/;

function testDate(date) {
  const dateTester = re.test(date);
  console.log(dateTester);
}

testDate('12-01-2022'); // true
testDate('31.11.1999'); // true
testDate('02-01-21'); // true
testDate('42-01-2021'); // false
testDate('22-91-23'); // false

You can see the date, month, year, and separator parts of the pattern are in their respective groups. If you want to match other formats like mm/dd/yyyy or yyyy/mm/dd, you can twist the pattern around.
You can even make the pattern a little shorter by putting the first separator in a group and referencing it for the second separator:
^(3[01]|[12][0-9]|0?[1-9])([-./])(1[0-2]|0?[1-9])\2(20[0-9]{2}|[0-9]{4}|[0-9]{2})$

How to Match US Zip Codes
The zip codes in the US are a 5-digit number, but they may also have a 4-digit extension, for example, 56893 or 56893-9232.
The pattern \b\d{5}\b would match a 5-digit zip-code:

You also need to account for the other 4 digits and the hyphen between the two sets of numbers. The pattern, \b\d{5}(\-\d{4})?\b would do that for you.
Here's an image that labels each part of the pattern and explais what they do:

You can also take the regex and extract all the zip codes that are matches:
const re = /\b\d{5}(\-\d{4})?\b/g;
const zipCodes = [
  '56893',
  'ca58392bn',
  '29043',
  '90342-9014',
  '89435',
  '75034',
  '90453-3056',
  '12345-6789',
  'b458923',
  '589323',
];

const matchedZipCodes = [];

for (const zipCode of zipCodes) {
  const matches = zipCode.match(re);
  if (matches) {
    matchedZipCodes.push(matches[0]);
  }
}

console.log(matchedZipCodes);

/*
Output:
[
  '56893',
  '29043',
  '90342-9014',
  '89435',
  '75034',
  '90453-3056',
  '12345-6789'
]
*/

And if you want the zip codes that are invalid, you can use the filter() array method to remove those that do not match the pattern:
const re = /\b\d{5}(\-\d{4})?\b/g;
const zipCodes = [
  '56893',
  'ca58392bn',
  '29043',
  '90342-9014',
  '89435',
  '75034',
  '90453-3056',
  '12345-6789',
  'b458923',
  '589323',
];

const invalidZipCodes = zipCodes.filter((zipCode) => !zipCode.match(re));

console.log(invalidZipCodes); // [ 'ca58392bn', 'b458923', '589323' ]

How to Match Email Addresses
Email addresses could be as simple as john@email.com, and as complex as you can ever imagine. So, there's no "one pattern" for validating email addresses. This also makes email validation a complex thing to do.
Validating emails with regex can also be a bit questionable because you can't stop anyone from making an email up. But still, there's a format you generally want the email address to be in whether it is made up or not. This is why you may want to use regular expressions to validate an email.
A pattern like ^/\w{4,}@\w{3,}\.\w{3,}$/ could be enough for validating simple and straightforward email addresses like john@example.com.
Here's an image that labels each part of the pattern and explains what they do:

And here are the emails that match:

As you can see, the pattern did not even match all the emails provided. That's because the pattern does not account for:

emails with a period within usernames like jane.doe@email.com 
second-level domain (SLD) extensions like john@example.abc.com 
and country code second-level domains (ccSLDs) like jane@email.co.uk

In fact, a single email can even combine all the criteria listed above.
A better pattern for matching emails is /^[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$/. 
I also prepare an illustration that labels each part of the pattern and shows what they do:

This pattern matches an email address better than the first one:

According to the RFC 5322 specification, the pattern that works 99% of the time for validating email is this:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

N.B.: You should surround the pattern with anchors so it doesn’t leave out a part of a possible email and match the others. 
This is what I'm trying to point out:

You can take that pattern into JavaScript and test it against some email addresses:
const emailRe =
  /^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$/;

function matchEmail(email) {
  if (emailRe.test(email)) {
    console.log('Valid email!');
  } else {
    console.log('Invalid email');
  }
}

matchEmail('janedoe@email.com');
matchEmail('john.doe@email.com');
matchEmail('7@koala@email.com!');
matchEmail('kayla.simpson@email.co.uk');
matchEmail('kayla.simpson@email.co..uk');

As I pointed out earlier, matching email addresses with regex is a complex task. If you know the kind of email you'll be working with, it is better to tailor your regex for them. 
Sometimes, to match an email, all you all you might need is a simple regex. Some other times, the pattern you need might be as complex as the one above.
How to Match Passwords
To match passwords, you can use a lookahead – since lookaround groups generally don’t consume characters. But there are always multiple ways of doing the same thing in regular expressions, and programming in general of course.
You've seen a lookahead for matching 6-digit passwords already. This time around, let's say the password should not be less than 8 characters with at least one uppercase, one lowercase, one digit, and one symbol.
Here's the regex pattern that does just that:
^(?=.{8,})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$?%"';^}{&:*()∞+=-]).*$

Here are the passwords it matches:

You can take that into JavaScript and test it against possible passwords:
const passwordRe =
  /^(?=.{8,})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$?%"';^}{&:*()∞+=-]).*$/gm;

function matchPassWord(password) {
  if (passwordRe.test(password)) {
    console.log(true);
  } else {
    console.log(false);
  }
}

matchPassWord('johnDoe21^');
matchPassWord('Strong@123');
matchPassWord('weakpassword');
matchPassWord('ABcd12$');
matchPassWord('Longpassword1234!');
matchPassWord('Short@1');
matchPassWord('janEdoe34$');

You can also extract each of those group into its variable and test a password against it. This would let you show an error for that particular condition the password is trying to match:
const passwordLength = /(?=.{8,})/,
  lowercaseChar = /(?=.*[a-z])/,
  uppercaseChar = /(?=.*[A-Z])/,
  numberChar = /(?=.*[0-9])/,
  specialChar = /(?=.*[!@#$?%"';^}{&:*()∞+=-])/;

function validatePassword(password) {
  if (
    passwordLength.test(password) &&
    lowercaseChar.test(password) &&
    uppercaseChar.test(password) &&
    numberChar.test(password) &&
    specialChar.test(password)
  ) {
    console.log('Valid password!');
  } else {
    console.log('Invalid Password');
  }
}

validatePassword('johnDoe21^');
validatePassword('Strong@123');
validatePassword('weakpassword');
validatePassword('ABcd12$');
validatePassword('Longpassword1234!');
validatePassword('Short@1');
validatePassword('janEdoe34$');

Form Validation with Regex
One of the most popular ways developers use regular expressions is form validation. Since a form usually has input fields like name, email, password, and others, you can write a regular expression for what you expect the user to put in those input fields.
I prepared a little website where I show you how to validate the name, username, email, and password fields of a form with regex. 
Here's the HTML:
html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="styles.css">
    <script src="form-validate.js" defer>script>
    <title>Form Validation with RegExtitle>
head>

<body>

    <div id="error-message">div>

    <form action="">

        <h1>Sign Uph1>
        <p>Fill in the form fieldsp>

        <div class="form-control">
            <label for="name">Namelabel>
            <input type="text" name="name" id="name">
        div>
        <div class="form-control">
            <label for="username">Usernamelabel>
            <input type="text" name="username" id="username">
        div>
        <div class="form-control">
            <label for="email">Emaillabel>
            <input type="email" name="email" id="email">
        div>
        <div class="form-control">
            <label for="email">Passwordlabel>
            <input type="password" name="password" id="password">
        div>
        <input type="submit" value="Submit" id="submit">
    form>

body>

html>

The CSS:
@import url('https://fonts.googleapis.com/css2?family=Roboto&display=swap');

* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}

body {
  background-color: #d0d0d5;
  color: #fff;
  font-family: 'Roboto', sans-serif;
}

form {
  position: absolute;
  top: 50%;
  left: 50%;
  transform: translate(-50%, -50%);
  background-color: #3b3b4f;
  padding: 0.4rem 3rem 1rem;
  border-radius: 2px;
}

p {
  margin: 0.5rem 0;
}

#error-message {
  background-color: crimson;
  color: #fff;
  max-width: 80%;
  margin: 0.5rem auto 0;
  padding: 0.2rem 0.5rem;
  border-radius: 4px;
}

#error-message p {
  font-size: 14px;
  text-align: center;
}

.form-control {
  display: flex;
  flex-direction: column;
}

.form-control label {
  margin-bottom: 0.2rem;
}

.form-control input {
  width: 14rem;
  margin-bottom: 1.2rem;
  padding: 0.2rem;
  border: 2px solid #d0d0d5;
  border-radius: 2px;
}

.form-control input:focus {
  outline: none;
}

input[type='submit'] {
  background-color: #fecc4c;
  border-color: #f1a02a;
  font-family: 'Roboto', sans-serif;
  padding: 0.3rem;
  border-width: 1px;
  cursor: pointer;
}

input[type='submit']:hover {
  background-color: #e3bd53;
}

.hide {
  display: none;
}

.show {
  display: block;
}

@media screen and (max-width: 768px) {
  #error-message {
    margin: 0.5rem auto 0;
    padding: 0.1rem 0.2rem;
  }
}

@media screen and (max-width: 667px) {
  form {
    top: 61%;
  }

  #error-message {
    margin: 0.2rem auto 0;
    padding: 0.1rem 0.4rem;
  }
}

Most importantly, some well-commented JavaScript that contains the patterns I used, and how I tested each of the patterns against the respective fields they correlate with:
// Get the form element
const form = document.querySelector('form');
// Get the div element that shows the error(s)
const errorMessageDiv = document.querySelector('#error-message');

// The RegEx patterns in a "patterns" object
const patterns = {
  nameRe: /^[a-zA-Z]{2,35}\s[a-zA-Z]{2,35}$/, // validates the name field
  usernameRe: /^[a-zA-Z]{3,30}(\d{1,4})?$/, // validates the username field
  emailRe: /^[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$/, // validates the email field
  passwordRe:
    /^(?=.{8,})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$?%"';^}{&:*()∞+=-]).*$/, // validates the password field
};

// Hide error message div when the page loads
errorMessageDiv.style.display = 'none';

// Add a submit event to the form
form.addEventListener('submit', validateAndSubmitForm);

// Form validation and submit function
function validateAndSubmitForm(e) {
  e.preventDefault();

  // Clear previous error messages
  errorMessageDiv.innerHTML = '';

  let nameInputValue = document.querySelector('#name').value;
  let usernameInputValue = document.querySelector('#username').value;
  let emailInputValue = document.querySelector('#email').value;
  let passwordInputValue = document.querySelector('#password').value;

  // Validate Name
  if (!patterns.nameRe.test(nameInputValue)) {
    showError('Name must have first name and last name separated by a space');
  }

  // Validate Username
  if (!patterns.usernameRe.test(usernameInputValue)) {
    showError(
      'Username must have between 3 and 30 characters and can include up to 4 digits at the end'
    );
  }

  // Validate Email
  if (!patterns.emailRe.test(emailInputValue)) {
    showError('Enter a valid email address');
  }

  // Validate Password
  if (!patterns.passwordRe.test(passwordInputValue)) {
    showError(
      'Password must contain at least 8 characters, one lowercase letter, one uppercase letter, one digit, and one special character.'
    );
  }

  // If there are no error messages, the form is valid, so you can submit it
  if (errorMessageDiv.innerHTML === '') {
    console.log(nameInputValue);
    console.log(usernameInputValue);
    console.log(emailInputValue);
    console.log(passwordInputValue);

    // Hide the errorMessageDiv element since there are no errors
    errorMessageDiv.style.display = 'none';

    // Greet user
    alert(`Hi ${usernameInputValue} 👋🏽 \nThanks for filling this form`);

    // Clear input fields with the reset() method
    document.forms[0].reset();
  } else {
    // Show the errorMessageDiv element if there are errors
    errorMessageDiv.style.display = 'block';
  }
}

// The function responsible for showing error(s)
function showError(message) {
  const errorMessageElement = document.createElement('p');

  errorMessageElement.innerText = message;
  errorMessageDiv.appendChild(errorMessageElement);
}

This is what the form does:

You can grab all the code in this GitHub repo.
Article Table of Contents Generator
You can leverage the power of regular expressions to create a markdown table of contents generator.
Markdown tables of contents are made up of h2 headings at the top level. Those h2 headings have an id attribute you can use as the link. If you take a look at those id attributes, they are in the format below:
[How to Do ABC on XYZ!!!](##howtodoabconxyz)

This means you need to:

use the text as it is as the link text and surround them with curly braces
replace all spaces with an empty string
replace all symbols with an empty string
convert all the letters to lowercase
surround the new link with parenthesis

The replace() and lowerCase() string methods will help you achieve those things.
Here's the HTML for the app:




    
    
    
    
    TOC Generator



    
        Please enter some heading texts!
    

    Markdown Table of Content Generator for your Next Article
    Paste in your headings to generate table of content
    

        
            
                
            

            
                
            
        
    

    
        
    





The CSS:
@import url('https://fonts.googleapis.com/css2?family=Poppins&family=Roboto&display=swap');

* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}

body {
  font-family: 'Poppins' sans-serif;
  background-color: #3b3b4f;
  color: #fff;
}

h1 {
  margin-top: 2rem;
}

h1,
h2 {
  text-align: center;
  color: black;
  margin-bottom: 1rem;
  color: white;
}

form {
  max-width: 90%;
  margin: 0 auto;
  background-color: #d0d0d5;
  padding: 2rem;
  border-radius: 2px;
}

.form-control {
  text-align: center;
}

textarea {
  padding: 0.2rem 2rem 1rem 0.2rem;
}

textarea:focus {
  outline: 1px solid #3b3b4f;
}

input[type='submit'] {
  font-family: 'Poppins', sans-serif;
  font-size: 1.1rem;
  border: none;
  background-color: #03732e;
  color: #fff;
  padding: 0.5rem 1rem;
  border-radius: 4px;
  margin-top: 1rem;
  transition: 0.3s;
}

input[type='submit']:hover {
  cursor: pointer;
  background-color: #00471b;
}

#generated-toc {
  max-width: 60%;
  margin: 1rem auto;
  background-color: #d0d0d5;
  color: black;
  padding: 2rem;
  border-radius: 2px;
  text-align: left;
  font-size: 1.1rem;
  display: none;
}

.alert {
  display: none;
  margin: 1rem auto;
  max-width: 20%;
  text-align: center;
  padding: 1rem 0;
  border-radius: 2px;
  background-color: #eb7189;
  color: black;
}

@media screen and (max-width: 768px) {
  textarea {
    width: 16rem;
  }

  .alert {
    max-width: 50%;
  }
}

And the well-commented JavaScript:
const form = document.querySelector('form');
const generatedToc = document.querySelector('#generated-toc');
const alert = document.querySelector('.alert');

// Regular expressions to remove spaces and special characters
const spaceRe = /\s+/g;
const symRe = /[°?+*$∞^%$#@!.,©:&;"=%'_\[\]–\/\\<>|÷™®)£(}{€¥¢—“”‘•~]/g;

function generateToc(e) {
  e.preventDefault();

  // Get the heading texts from the textarea
  const headingTexts = document.querySelector('#toc').value;

  if (headingTexts === '') {
    // Alert the user to enter heading texts
    alert.style.display = 'block';

    // hide the alert after 3 seconds
    setTimeout(() => {
      alert.style.display = 'none';
    }, 3000);

    // hide generated table of content (if any) since the user is trying to paste in another one
    generatedToc.style.display = 'none';
    return;
  }

  // Split the heading texts into an array of lines
  const headingLines = headingTexts.split('\n');

  // Create an initial empty variable to save the table of content inside later
  let tocContent = '';

  // Loop through each line and generate the table of content items
  headingLines.forEach((headingLine) => {
    // Remove any leading and/or trailing spaces from the line
    headingLine = headingLine.trim();

    // skip empty lines
    if (headingLine === '') {
      return;
    }

    // Generate the TOC link based on the heading text(s)
    const markdownLink = headingLine
      .replace(spaceRe, '') // replace spaces with an empty string
      .replace(symRe, '') // replace special characters (symbols)
      .toLowerCase(); // convert the link texts to lowercase characters

    // Create the table of contents item and append it to the tocContent variable
    tocContent += `• [${headingLine}](#${markdownLink})
`;
  });

  // Insert the generated table of contents into the "generated-toc" div element
  generatedToc.innerHTML = tocContent;

  // hide alert since there's currently no error at this point
  alert.style.display = 'none';

  // show the "generated-toc" div
  generatedToc.style.display = 'block';

  // clear the heading texts in the text area
  document.querySelector('#toc').value = '';
}

// Add a submit event to the form
form.addEventListener('submit', generateToc);

/*
What is HTML?
How to Contribute$ To Open Source Like a Boss!!
Why you should Learn to C$ode in Java?

Why you should get into Web3!
Don't Attach Question Mark(?) to Hows!
Stop Scaring Newbies!
Why are you too cold&
*/

Here's what's happening in the app:

You can look through the code to have more understanding of how I was able to do that. The code is available on this GitHub repo and the app is live here.
Glossary and References
Glossary of Terms

Regular Expression or RegEx: A you can use for matching, searching, and manipulating text.
Pattern or regex pattern: A sequence of characters that defines a search criterion in a regular expression.
Literal Character: A character that matches itself in a regular expression (for example, "a" matches the character "a").
Flag: Modifiers added after the closing delimiter of a regex to change matching behavior, such as i (case-insensitive) or g (global).
Metacharacter: A character with a special meaning in a regular expression. Examples include . (any character), * (zero or more), and | (alternation).
Quantifier: A metacharacter that specifies the number of repetitions of the preceding element. For example, * matches zero or more occurrences, and {n} matches n character(s).
Anchors: Metacharacters that represent positions in the input string, such as ^ (start of line) and $ (end of line).
Grouping: Using parentheses () to create a subexpression you can repeat or reference as a single unit.
Capture Group: A group in a regular expression that captures and stores the matched text for later use.
Non-Capturing Group: A group in a regular expression that matches the pattern but does not capture the matched text.
Greedy: A matching behavior where quantifiers try to match as much as possible.
Lazy: Another matching behavior where quantifiers match as little as possible. It is the opposite of greedy.
Lookahead: A zero-width assertion that looks ahead to see if a pattern exists without including it in the match.
Lookbehind: A zero-width assertion that looks behind to see if a pattern exists without including it in the match.
Escape Sequence and Character: Using a backslash \to escape a metacharacter to treat it as a literal character. Or using it before a character to match its special meaning instead of the literal character. For example, \d.
Word Boundary: A zero-width assertion that matches the position between a word character and a non-word character.
Negated Character Class: A character class with ^ as the first character, matching any character not in the class.
Regex Engine: The underlying software component that processes regular expressions and performs matching.
Case Sensitive: A matching behavior where letters' cases must exactly match in the regex pattern and the input string.
Case Insensitive: A flag (i) that enables case-insensitive matching in the regular expression.
Shorthand Character Class: Shortcuts for common character classes, such as \d (digit), \w (word character), and \s (whitespace).
Backreference: Referring to a captured group's content in the regex pattern. For example, \1.
Alternation: Using the | metacharacter to match either of two patterns.
JavaScript RegExp Object: The built-in JavaScript object that represents a regular expression. It has methods like test() and exec() for working with regular expressions.
Regular Expression Literals: Regular expressions defined using slashes /.../, e.g., /regex-pattern/.
RegExp Constructor: The RegExp constructor for creating regular expressions dynamically.

Quick Reference of Metacharacters and Quantifiers

\d: matches any digit (0-9).
\D: matches any non-digit character.
\w: matches any word character (alphanumeric characters and underscore).
\W: matches any non-word character.
\s: matches any whitespace character (space, tab, newline, carriage return).
\S: matches any non-whitespace character.
\b: matches a word boundary position.
\B: matches a non-word boundary position.
^: matches the start of the line.
$: matches the end of the line.
.: matches any character except newline.
*: matches zero or more occurrences.
+: matches one or more occurrences.
?: matches zero or one occurrence.
{n}: matches exactly n (number) occurrences.
{n,}: matches n or more occurrences.
{n,m} matches at least n and at most m (another number) occurrences.
|: matches either the left or right expression.
(...): capturing group.
(?:...): non-capturing group.
\: escapes a metacharacter in order to match it literally, or escapes a metacharacter that is also a literal character. For example, \d.
[...]: character class.
[^...]: negated character class.
(?=...):  positive lookahead.
(?!...): negative lookahead.
(?<=...) positive lookbehind.
(?: negative lookbehind.


Thank you for reading!
 


 How to Use Regular Expressions in YAML File – RegEx in YAML Tutorial 
Kolade Chris — Wed, 17 May 2023 14:48:51 +0000
 YAML does not have built-in support for regular expressions. But you can still include regex patterns as part of a YAML file's contents, access those patterns, and create a regex out of them.
You can do this, for example, with the JavaScript RegExp constructor.
So, in YAML, regular expressions are typically represented as strings, using a specific syntax to define the pattern. For example, a YAML key-value pair that includes a regular expression pattern might look like this:
example:
pattern: ^[A-Za-z]+$

In this article, I'll show you how to write regular expressions inside a YAML file and access its entries in a JavaScript file. Let's take a look at what the YAML file is first.
What We'll Cover

What is a YAML File?

How to Write Regular Expressions in a YAML File

How to Import a YAML File in JavaScript and Use it

Conclusion


What is a YAML File?
YAML stands for YAML ain't markup language. It is a human and machine-readable data serialization file format. It is often used as configuration files, for data exchange, and for representing structured data in DevOps engineering.
YAML files use indentation and a concise syntax to define data structures such as lists, dictionaries (key-value pairs), and scalars (strings, numbers, booleans).
Each entry in a YAML file can be string, number, or Boolean, and other YAML-specific data types like scalars and lists. Here's a YAML file containing those data types:
# YAML Data Types Example
# -----------------------

# Scalars
null_example: null           # Null Scalar
bool_example: true           # Boolean Scalar
int_example: 42              # Integer Scalar
float_example: 3.14          # Float Scalar
str_example: "Hello, YAML!"  # String Scalar

# Sequences (Arrays)
seq_example:                 # Sequence (Array)
  - Apple
  - Orange
  - Banana

# Mappings (Dictionaries)
map_example:                 # Mapping (Dictionary)
  key1: value1
  key2: value2
  key3: value3

# List (Sequence of Mappings)
list_example:                # List of Mappings (Sequence of Dictionaries)
  - name: John
    age: 30
  - name: Jane
    age: 28
  - name: Bob
    age: 35

You can also put regular expressions right inside a YAML file. And that's what we'll look at next.
How to Write Regular Expressions in a YAML File
You can represent specific values in a YAML file as regular expressions. Below are some validation regex patterns:
# validator.yaml file
password:
  pattern: ^(?!.*[\s])(?=.*[A-Z])(?=.*[a-z])(?=.*\d)[A-Za-z\d@$!%*#?&]{8,}$
  description: |
    - At least 8 characters
    - At least one uppercase letter
    - At least one lowercase letter
    - At least one digit
    - Allowed special characters: @$!%*#?&

nigerianPhoneNumber:
  pattern: ^(\+?234|0)[789]\d{9}$
  description: |
    - Nigerian phone number format
    - Starts with +234 or 0
    - Followed by 7, 8, or 9
    - Total of 11 digits

email:
  pattern: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
  description: |
    - Valid email address format
    - Example: example@example.com

username:
  pattern: ^[a-zA-Z0-9_-]{3,16}$
  description: |
    - Allowed characters: letters (upper and lower case), numbers, underscore (_), and hyphen (-)
    - Minimum length: 3 characters
    - Maximum length: 16 characters

You can then import the YAML file into your JavaScript file and do what you want with it – for instance, create regular expressions out of those patterns and use them.
But that process is not straightforward. So that's the next thing you'll learn in this article.
How to Import a YAML File in JavaScript and Use it
If you attempt to import any YAML file into a JavaScript file with the import syntax, like import abc from file.yaml, this is the kind of error you'll get:

Instead of doing it that way, you should create a package.json in your project directory by running npm init -y and install the js-yaml package by running npm install js-yaml.
After that, import the fs module of Node.js and the js-yaml package this way:
const fs = require('fs');
const yaml = require('js-yaml');

The next thing you should do is read the validator.yaml file with the readFileSync method of the fs module and parse the YAML file with the load() method:
const yamlData = fs.readFileSync('validator.yaml', 'utf8');
const parsedData = yaml.load(yamlData);

All that's left to do is to access any of the patterns, create a RegEx out of it, and use it. This is how I used the password pattern:
const passwordPattern = parsedData.password.pattern;
const pwordValidator = new RegExp(passwordPattern);

const myPassword = 'reallyStrongPassword21!';
console.log(pwordValidator.test(myPassword)); //true

Here's how I used the Nigerian phone number validator pattern:
const phonePattern = parsedData.nigerianPhoneNumber.pattern;

phoneValidator = new RegExp(phonePattern);

const myPhoneNum = '08133333333';
console.log(phoneValidator.test(myPhoneNum)); //true;

Here's the full code:
// import the fs module to be able to access the YAML file
const fs = require('fs');

// import the YAML package
const yaml = require('js-yaml');

// Read the validator.yaml file with the FS module
const yamlData = fs.readFileSync('test.yaml', 'utf8');

// parse the YAML file
const parsedData = yaml.load(yamlData);

// Access the password validator pattern from the YAML file
const passwordPattern = parsedData.password.pattern;

// Create a regex out of the password pattern
const pwordValidator = new RegExp(passwordPattern);

const myPassword = 'reallyStrongPassword21!';
console.log(pwordValidator.test(myPassword)); //true

// Access the nigeriaPhoneNumber validator pattern from the YAML file
const phonePattern = parsedData.nigerianPhoneNumber.pattern;

// Create a regex out of the phonePAttern
phoneValidator = new RegExp(phonePattern);

const myPhoneNum = '08133333333';
console.log(phoneValidator.test(myPhoneNum)); //true;

// Access the email validator pattern from the YAML file
const emailPattern = parsedData.email.pattern;

// Create a regex out of the phonePAttern
emailValidator = new RegExp(emailPattern);

const emailAddress = 'chris@gmail.com';
console.log(emailValidator.test(emailAddress)); //false;

// Access the username validator pattern from the YAML file
const usernamePattern = parsedData.username.pattern;

// Create a regex out of the phonePAttern
usernameValidator = new RegExp(usernamePattern);

const username = 'ksound22';
console.log(usernameValidator.test(username)); //false;

Conclusion
This article showed you how to put regular expressions in a YAML file, import it into a JavaScript file with the js-yaml package, and access any of the values in it.
We also looked at how you can create regular expressions out of the patterns in the YAML file and test them with some strings.
Thanks for reading. If you find the article helpful, kindly share it with your friends and family.
 


 How to Use Regular Expressions in JavaScript – Tutorial for Beginners 
freeCodeCamp — Tue, 16 Aug 2022 17:51:48 +0000
 By Chinwendu Enyinna
Regular expressions (regex) are a useful programming tool. They are key to efficient text processing. Knowing how to solve problems using regex is helpful to you as a developer and improves your productivity. 
In this article, you will learn about the fundamentals of regular expressions, regular expression pattern notation, how you can interpret a simple regex pattern, and how to write your own regex pattern. Let’s get to it!
What Are Regular Expressions?
Regular expressions are patterns that allow you to describe, match, or parse text. With regular expressions, you can do things like find and replace text, verify that input data follows the format required, and and other similar things.
Here's a scenario: you want to verify that the telephone number entered by a user on a form matches a format, say, ###-###-#### (where # represents a number). One way to solve this could be:
function isPattern(userInput) {
  if (typeof userInput !== 'string' || userInput.length !== 12) {
    return false;
  }
  for (let i = 0; i < userInput.length; i++) {
    let c = userInput[i];
    switch (i) {
      case 0:
      case 1:
      case 2:
      case 4:
      case 5:
      case 6:
      case 8:
      case 9:
      case 10:
      case 11:
        if (c < 0 || c > 9) return false;
        break;
      case 3:
      case 7:
        if (c !== '-') return false;
        break;
    }
  }
  return true;
}

Alternatively, we can use a regular expression here like this:
function isPattern(userInput) {
  return /^\d{3}-\d{3}-\d{4}$/.test(userInput);
}

Notice how we’ve refactored the code using regex. Amazing right?  That is the power of regular expressions.
How to Create A Regular Expression
In JavaScript, you can create a regular expression in either of two ways:

Method #1: using a regular expression literal. This consists of a pattern enclosed in forward slashes. You can write this with or without a flag (we will see what flag means shortly). The syntax is as follows:

const regExpLiteral = /pattern/;          // Without flags

const regExpLiteralWithFlags = /pattern/; // With flags

The forward slashes /…/ indicate that we are creating a regular expression pattern, just the same way you use quotes “ ” to create a string.

Method #2: using the RegExp constructor function. The syntax is as follows:

new RegExp(pattern [, flags])

Here, the pattern is enclosed in quotes, the same as the flag parameter, which is optional.
So when do you use each of these pattern?
You should use a regex literal when you know the regular expression pattern at the time of writing the code. 
On the other hand, use the Regex constructor if the regex pattern is to be created dynamically. Also, the regex constructor lets you write a pattern using a template literal, but this is not possible with the regex literal syntax.
What are Regular Expression Flags?
Flags or modifiers are characters that enable advanced search features including case-insensitive and global searching. You can use them individually or collectively. Some commonly used ones are:

g is used for global search which means the search will not return after the first match.
i is used for case-insensitive search meaning that a match can occur regardless of the casing.
m is used for multiline search.
u is used for Unicode search.

Let’s look at some regular expression patterns using both syntaxes.
How to use a regular expression literal:
// Syntax: /pattern/flags

const regExpStr = 'Hello world! hello there';

const regExpLiteral = /Hello/gi;

console.log(regExpStr.match(regExpLiteral));

// Output: ['Hello', 'hello']

Note that if we did not flag the pattern with i, only Hello will be returned. 
The pattern /Hello/ is an example of a simple pattern. A simple pattern consists of characters that must appear literally in the target text. For a match to occur, the target text must follow the same sequence as the pattern. 
For example, if you re-write the text in the previous example and try to match it:
const regExpLiteral = /Hello/gi;

const regExpStr = 'oHell world, ohell there!';

console.log(regExpStr.match(regExpLiteral));

// Output: null

We get null because the characters in the string do not appear as specified in the pattern. So a literal pattern such as /hello/, means h followed by e followed by l followed by l followed by o, exactly like that.
How to use a regex constructor:
// Syntax: RegExp(pattern [, flags])

const regExpConstructor = new RegExp('xyz', 'g'); // With flag -g

const str = 'xyz xyz';

console.log(str.match(regExpConstructor));

// Output: ['xyz', 'xyz']

Here, the pattern xyz is passed in as a string same as the flag. Also both occurrences of xyz got matched because we passed in the -g flag. Without it, only the first match will be returned. 
We can also pass in dynamically created patterns as template literals using the constructor function. For example:
const pattern = prompt('Enter a pattern');
// Suppose the user enters 'xyz'

const regExpConst = new RegExp(`${pattern}`, 'gi');

const str = 'xyz XYZ';

console.log(str.match(regExpConst)); // Output: ['xyz', 'XYZ']

How to Use Regular Expression Special Characters
A special character in a regular expression is a character with a reserved meaning. Using special characters, you can do more than just find a direct match. 
For example, if you want to match a character in a string that may or may not appear once or multiple times, you can do this with special characters. These characters fit into different subgroups that perform similar functions.
Let's take a look at each subgroup and the characters that go with them.
Anchors and Boundaries:
Anchors are metacharacters that match the start and end of a line of text they are examining. You use them to assert where a boundary should be. 
The two characters used are ^ and $.

^ matches the start of a line and anchors a literal at the beginning of that line. For example:

const regexPattern1 = /^cat/;

console.log(regexPattern1.test('cat and mouse')); // Output: true

console.log(regexPattern1.test('The cat and mouse')); // Output: false because the line does not start with cat

// Without the ^ in the pattern, the output will return true
// because we did not assert a boundary.

const regexPattern2 = /cat/;

console.log(regexPattern2.test('The cat and mouse')); // Output: true


$ matches the end of a line and anchors a literal at the end of that line. For example:

const regexPattern = /cat$/;

console.log(regexPattern.test('The mouse and the cat')); // Output: true

console.log(regexPattern.test('The cat and mouse')); // Output: false

Note that anchors characters ^ and $ match just the position of the characters in the pattern and not the actual characters themselves.
Word Boundaries are metacharacters that match the start and end position of a word – a sequence of alphanumeric characters. You can think of them as a word-based version of ^ and $.  You use the metacharacters b and B to assert a word boundary. 

\b matches the start or end of a word. The word is matched according to the position of the metacharacter. Here's an example:

// Syntax 1: /\b.../ where .... represents a word.

// Search for a word that begins with the pattern ward
const regexPattern1 = /\bward/gi;

const text1 = 'backward Wardrobe Ward';

console.log(text1.match(regexPattern1)); // Output: ['Ward', 'Ward']

// Syntax 2: /...\b/

// Search for a word that ends with the pattern ward
const regexPattern2 = /ward\b/gi;

const text2 = 'backward Wardrobe Ward';

console.log(text2.match(regexPattern2)); // Output: ['ward', 'Ward']

// Syntax 3: /\b....\b/

// Search for a stand-alone word that begins and end with the pattern ward
const regexPattern3 = /\bward\b/gi;

const text3 = 'backward Wardrobe Ward';

console.log(text3.match(regexPattern3)); // Output: ['Ward']


\B is opposite of \b . It matches every position \b doesn't.

Shortcodes for Other Metacharacters:
In addition to the metacharacters we have looked at, here are some of the most commonly used ones:

\d – matches any decimal digit and is shorthand for [0-9].
\w – matches any alphanumeric character which could be a letter, a digit, or an underscore. \w is shorthand for [A-Za-z0-9_].
\s – matches any white space character.
\D – matches any non-digit and is the same as [^0-9.]
\W – matches any non-word (that is non-alphanumeric) character and is shorthand for  [^A-Za-z0-9_].
\S – matches a non-white space character.
. – matches any character.

What is a Character Class?
A character class is used to match any one of several characters in a particular position. To denote a character class, you use square brackets [] and then list the characters you want to match inside the brackets. 
Let's look at an example:
// Find and match a word with two alternative spellings

const regexPattern = /ambi[ea]nce/;

console.log(regexPattern.test('ambiance')); // Output: true

console.log(regexPattern.test('ambiance')); // Output: true

// The regex pattern interprets as:  find a followed by m, then b,
// then i, then either e or a, then n, then c, and then e.

What is a Negated Character Class?
If you add a caret symbol inside a character class like this [^...], it will match any character that is not listed inside the square brackets. For example:
const regexPattern = /[^bc]at/;

console.log(regexPattern.test('bat')); // Output: false

console.log(regexPattern.test('cat')); // Output: false

console.log(regexPattern.test('mat')); // Output: true

What is a Range?
A hyphen - indicates range when used inside a character class. Suppose you want to match a set of numbers, say [0123456789], or a set of characters, say[abcdefg]. You can write it as a range like this, [0-9] and [a-g], respectively.
What is Alternation?
Alternation is yet another way you can specify a set of options. Here, you use the pipe character | to match any of several subexpressions. Either of the subexpressions is called an alternative. 
The pipe symbol means ‘or’, so it matches a series of options. It allows you combine subexpressions as alternatives. 
For example, (x|y|z)a will match xa or ya, or za.  In order to limit the reach of the alternation, you can use parentheses to group the alternatives together. 
Without the parentheses, x|y|za  would mean x or y or za. For example:
const regexPattern = /(Bob|George)\sClan/;

console.log(regexPattern.test('Bob Clan')); // Output: true

console.log(regexPattern.test('George Clan')); // Output: true

What are Quantifiers and Greediness?
Quantifiers denote how many times a character, a character class, or group should appear in the target text for a match to occur. Here are some peculiar ones:

+ will match any character it is appended to if the character appears at least once. For example:

const regexPattern = /hel+o/;

console.log(regexPattern.test('helo'));          // Output:true

console.log(regexPattern.test('hellllllllllo')); // Output: true

console.log(regexPattern.test('heo'));           // Output: false


* is similar to the + character but with a slight difference. When you append * to a character, it means you want to match any number of that character including none. Here’s an example:

const regexPattern = /hel*o/;

console.log(regexPattern.test('helo'));    // Output: true

console.log(regexPattern.test('hellllo')); // Output: true

console.log(regexPattern.test('heo'));     // Output: true

// Here the * matches 0 or any number of 'l'


? implies "optional". When you append it to a character, it means the character may or may not appear. For example:

const regexPattern = /colou?r/;

console.log(regexPattern.test('color'));  // Output: true

console.log(regexPattern.test('colour')); // Output: true

// The ? after the character u makes u optional


{N}, when appended to a character or character class, specifies how many of the character we want. For example /\d{3}/ means match three consecutive digits.
{N,M} is called the interval quantifier and is used to specify a range for the minimum and maximum possible match. For example /\d{3, 6}/ means match a minimum of 3 and a maximum of 6 consecutive digits.
{N, } denotes an open-ended range. For example /\d{3, }/ means match any 3 or more consecutive digits.

What is Greediness in Regex?
All quantifiers by default are greedy. This means that they will try to match all possible characters. 
To remove this default state and make them non-greedy, you append a ? to the operator like this +?, *?, {N}?, {N,M}?.....and so on.
What are Grouping and Backreferencing?
We previously looked at how we can limit the scope of alternation using the parentheses. 
What if you want to use a quantifier like + or * on more than one character at a time – say a character class or group? You can group them together as a whole using the parentheses before appending the quantifier, just like in this example:
const regExp = /abc+(xyz+)+/i;

console.log(regExp.test('abcxyzzzzXYZ')); // Output: true

Here's what the pattern means: The first + matches the c of abc, the second + matches the z of xyz, and the third + matches the subexpression xyz, which will match if the sequence repeats.
Backreferencing allows you to match a new pattern that is the same as a previously matched pattern in a regular expression. You also use parentheses for backreferencing because it can remember a previously matched subexpression it encloses (that is, the captured group).
However, it is possible to have more than one captured group in a regular expression. So, to backreference any of the captured group, you use a number to identify the parentheses. 
Suppose you have 3 captured groups in a regex and you want to backreference any of them. You use \1, \2, or \3, to refer to the first, second, or third parentheses. To number the parentheses, you start counting the open parentheses from the left.
Let's look at some examples:
(x) matches x and remembers the match.
const regExp = /(abc)bar\1/i;

// abc is backreferenced and is anchored at the same position as \1
console.log(regExp.test('abcbarAbc')); // Output: true

console.log(regExp.test('abcbar')); // Output: false

(?:x) matches x but does not recall the match. Also, \n (where n is a number) does not remember a previously captured group, and will match as a literal. Using an example:
const regExp = /(?:abc)bar\1/i;

console.log(regExp.test('abcbarabc')); // Output: false

console.log(regExp.test('abcbar\1')); // Output: true

The Escape Rule
A metacharacter has to be escaped with a backslash if you want it to appear as a literal in your regular expression. By escaping a metacharacter in regex, the metacharacter loses its special meaning.
Regular Expression Methods
The test() method
We have used this method a number of times in this article. The test() method compares the target text with the regex pattern and returns a boolean value accordingly. If there is a match, it returns true, otherwise it returns false.
const regExp = /abc/i;

console.log(regExp.test('abcdef')); // Output: true

console.log(regExp.test('bcadef')); // Output: false

The exec() method
The exec() method compares the target text with the regex pattern. If there's a match, it returns an array with the match – otherwise it returns null. For example:
const regExp = /abc/i;

console.log(regExp.exec('abcdef'));
// Output: ['abc', index: 0, input: 'abcdef', groups: undefined]

console.log(regExp.exec('bcadef'));
// Output: null

Also, there are string methods that accept regular expressions as a parameter like [match()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match), [replace()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), [replaceAll()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll), [matchAll()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll), [search()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search), and [split()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split). 
Regex Examples
Here are some examples to reinforce some of the concepts we've learned in this article.
First example: How to use a regex pattern to match an email address:
const regexPattern = /^[(\w\d\W)+]+@[\w+]+\.[\w+]+$/i;

console.log(regexPattern.test('abcdef123@gmailcom'));
// Output: false, missing dot

console.log(regexPattern.test('abcdef123gmail.'));
// Output: false, missing end literal 'com'

console.log(regexPattern.test('abcdef123@gmail.com'));
// Output: true, the input matches the pattern correctly

Let's interpret the pattern. Here's what's happening:

/ represents the start of the regular expression pattern.
^ checks for the start of a line with the characters in the character class.
**[(\w\d\W)+ ]+** matches any word, digit and non-word character in the character class at least once. Notice how the parentheses were used to group the characters before adding the quantifier. This is same as this [\w+\d+\W+]+ .
**@** matches the literal @ in the email format.
**[\w+]+** matches any word character in this character class at least once.
**\.** escapes the dot so it appears as a literal character.
**[\w+]+$** matches any word character in this class. Also this character class is anchored at the end of the line.
**/** - ends the pattern

Alright, next example: how to match a URL with format http://example.com or https://www.example.com:
const pattern = /^[https?]+:\/\/((w{3}\.)?[\w+]+)\.[\w+]+$/i;

console.log(pattern.test('https://www.example.com'));
// Output: true

console.log(pattern.test('http://example.com'));
// Output: true

console.log(pattern.test('https://example'));
// Output: false

Let's also interpret this pattern. Here's what's happening:

/...../ represents the start and end of the regex pattern
^ asserts for the start of the line
[https?]+ matches the characters listed at least once, however ? makes 's'  optional.
: matches a literal semi-colon.
\/\/ escapes the two forward slashes.
(w{3}\.) matches the character w 3 times and the dot that follows immediately. However, this group is optional.
[\w+]+ matches character in this class at least once.
\. escapes the dot
[\w+]+$ matches any word character in this class. Also this character class is anchored at the end of the line.

Conclusion
In this article, we looked at the fundamentals of regular expressions. We also explained some regular expression patterns, and practiced with a few examples.
There's more to regular expressions beyond this article. To help you learn more about regular expressions, here are some resources you can read through:

Regular Expression
Learn Regex crash course
Regular Expression Tutorial
Regular Expression Cheatsheet

And that's all for this tutorial. Happy coding :)
 


 Google Sheets Tutorial – How to Use Regex and VLOOKUP to Display Images from Google Drive 
Eamonn Cottrell — Wed, 03 Aug 2022 17:03:25 +0000
 Images make many things better. And Google Sheets is one of those things. 
The easiest way to add an image to Google Sheets is to simply insert one into your sheet. 
But if you have added many images this way, you'll quickly tire of the multiple clicks it takes to do so. Especially if you have to add images often, or if you have to add the same images to multiple sheets.
In this article, you'll learn how to add many images from their URLs that you can dynamically toggle between in a dropdown list. We'll cover:

Data Validation for creating a dropdown list
Named Ranges to make formula references easier and cleaner
The VLOOKUP function to display the right image from the dropdown list
The REGEXEXTRACT function to extract a string from a URL (don't worry, it'll make sense 😉)
The IMAGE function to display the image from a URL address
We'll use the ampersand (&) operator as well as regular expressions (Regex)
We'll also make our sheet look good by removing gridlines, changing the font, adding borders, colors, and a drop shadow effect behind tables

How to Setup the Project 📐
You can follow along with the sheet I'm using for everything we'll discuss:
https://docs.google.com/spreadsheets/d/1rFU2gPy6rU8IKFDmsxKHYCf0KGVHkcumQ5O5QCf156M/edit?usp=sharing
Make a copy if you want to edit it yourself.

Make a copy to edit yourself
All cell and range references below will be from this sheet so you can easily look and see what I'm talking about.
I've also made a folder of images here that is publicly shared so all this works. You don't have to make a copy of this unless you just want to 😀.

How to Use Named Ranges in Google Sheets 📛
Named ranges make life easier. 
You don't have to use them, but it makes references in functions easier since you'll be writing the name of something instead of a sterile cell reference.
We'll use three of them:

B4 = itemSelect This is the cell where our dropdown list will live.
B8:G13 = pictureMatch This is the range for our VLOOKUP function. It contains the names of the pictures we'll display followed by their respective URLs.
B8:B16 = pictureName This is the first column of the pictureMatch range for referencing just the names in our data validation cell.

To create a named range, simply highlight the range, select Data -> Named ranges from the toolbar, and name it.

How to Perform Data Validation 📃
We'll use data validation to create a dropdown list in B4. Same deal here – just highlight the cell (or range) and select Data -> Data validation from the toolbar:

Select List from a range, and then =pictureName (because we named that range) for the range. Alternatively, you can declare the range explicitly.
There are additional options to configure if you want to change anything:

If you select reject input, you can have a custom message pop up whenever an invalid choice is entered:

You might want to make your message more helpful than this one.
How to Use VLOOKUP 📊
VLOOKUP is an incredibly useful function. It takes four arguments: 
=VLOOKUP(search_key, range, index, [is_sorted])

=VLOOKUP(itemSelect,pictureMatch,3,0)
We'll use itemSelect for our search_key and pictureMatch for the range because we want to find itemSelect in that range. Then the 3 for index gets the value in the third column in that range. 
(It's 3 in our example because we merged the cells in columns B & C for our formatting, but VLOOKUP still counts both of them).
Finally, the zero sets is_sorted to FALSE. Our data is not sorted, and we want an exact match.
How to Use REGEXEXTRACT 💾
It happened: I found a real world use for Regular Expressions. 😳
This section of freeCodeCamp's Javascript certification was particularly confusing for me, and it was good to revisit a small portion of it here in the wild.
Because Google Drive is quirky, and we're sort of hacking a free option here, we need to alter the URLs to our images in order for the IMAGE function to work properly.
This Stack Overflow answer was helpful for me.
We need to build a URL by taking this:
https://drive.google.com/uc?export=download&id=###

and replacing the ### part at the end with the ID we extract with the REGEXEXTRACT function.
Looking at the URLs we copied over, we can see a pattern. Everything after the /d/ and then before the next / is the ID. 
Here's an example of one of our image URLs: https://drive.google.com/file/d/1IaO08gj3GWIUQDAnzKEob62Gcl87ufuN/view?usp=sharing
You can see this at work by itself in B26 of the example spreadsheet as the function grabs everything between those two markers:
=REGEXEXTRACT(D9,".*/d/(.*)/")


This extracts everything between the /d/ and the /
How to Use the IMAGE Function 📷
Okay. We've got the disparate pieces figured out. I know the pieces fit. 🎵 
Let's put them together.
All of our work was to get one cell ( B4 ) to provide data to the IMAGE function.
Image takes one argument and three other optional ones: 
 IMAGE(url, [mode], [height], [width])

We build the URL by combining the required beginning of the URL which I've got in J17 using the ampersand (&) operator with our REGEXEXTRACT function. And within our REGEXEXTRACT function we use our VLOOKUP function to get the URL of whatever image we've selected in the itemSelect cell.
Whew. 
But, cool, right!?

If you feel lost in a recursive nightmare, I encourage you to pull up the example spreadsheet and examine the parts of the function in F4 piece by piece. 👍
How to Format Your Sheet FTW 💯
These few details can turn up the volume 📣 on an otherwise mundane spreadsheet.

This is likely the only place you'll find a NIN gif in an article about spreadsheets today.
I love a hard drop shadow, and we can achieve this by manipulating the row and column sizes around a particular cell or range, using the merge cell option for our main range, and then using a fill color around the right side and bottom.
Click the lines between the column headers to drag and adjust the widths and heights of the columns and rows.

Cells are the main appeal of spreadsheets, but in some cases hiding the gridlines can make your sheet standout. I opted for this approach in this project. 
Select View->Show->Gridlines.

As much as I appreciate Arial, I will typically opt out of the default font immediately. 
Click the Font Dropdown in the Toolbar. It's usually smack dab in the middle:

And just choose whatever font you'd like.
There you have it!
Thanks for Reading! 🙏
Follow me on Twitter to see more content like this: https://twitter.com/EamonnCottrell
Thanks!

 


 JavaScript Regex Match Example – How to Use JS Replace on a String 
Kristofer Koishigawa — Mon, 04 Jan 2021 10:21:00 +0000
 Regular expressions, abbreviated as regex, or sometimes regexp, are one of those concepts that you probably know is really powerful and useful. But they can be daunting, especially for beginning programmers.
It doesn't have to be this way. JavaScript includes several helpful methods that make using regular expressions much more manageable. Of the included methods, the .match(), .matchAll(), and .replace() methods are probably the ones you'll use most often.
In this tutorial, we'll go over the ins and outs of those methods, and look at some reasons why you might use them over the other included JS methods 
A quick introduction to regular expressions
According to MDN, regular expressions are "patterns used to match character combinations in strings".
These patterns can sometimes include special characters (*, +), assertions (\W, ^), groups and ranges ((abc), [123]), and other things that make regex so powerful but hard to grasp.
At its core, regex is all about finding patterns in strings – everything from testing a string for a single character to verifying that a telephone number is valid can be done with regular expressions.
If you're brand new to regex and would like some practice before reading on, check out our interactive coding challenges.
How to use the .match() method
So if regex is all about finding patterns in strings, you might be asking yourself what makes the .match() method so useful?
Unlike the .test() method which just returns true or false, .match() will actually return the match against the string you're testing. For example:
const csLewisQuote = 'We are what we believe we are.';
const regex1 = /are/;
const regex2 = /eat/;

csLewisQuote.match(regex1); // ["are", index: 3, input: "We are what we believe we are.", groups: undefined]

csLewisQuote.match(regex2); // null

This can be really helpful for some projects, especially if you want to extract and manipulate the data that you're matching without changing the original string.
If all you want to know is if a search pattern is found or not, use the .test() method – it's much faster.
There are two main return values you can expect from the .match() method:

If there's a match, the .match() method will return an array with the match. We'll go into more detail about this in a bit.
If there isn't a match, the .match() method will return null.

Some of you might have already noticed this, but if you look at the example above, .match() is only matching the first occurrence of the word "are".
A lot of times you'll want to know how often a pattern is matched against the string you're testing, so let's take a look at how to do that with .match().
Different matching modes
If there's a match, the array that .match() returns had two different modes, for lack of a better term.
The first mode is when the global flag (g) isn't used, like in the example above:
const csLewisQuote = 'We are what we believe we are.';
const regex = /are/;

csLewisQuote.match(regex); // ["are", index: 3, input: "We are what we believe we are.", groups: undefined]

In this case, we .match() an array with the first match along with the index of the match in the original string, the original string itself, and any matching groups that were used.
But say you want to see how many times the word "are" occurs in a string. To do that, just add the global search flag to your regular expression:
const csLewisQuote = 'We are what we believe we are.';
const regex = /are/g;

csLewisQuote.match(regex); // ["are", "are"]

You won't get the other information included with the non-global mode, but you'll get an array with all the matches in the string you're testing.
Case sensitivity
An important thing to remember is that regex is case sensitive. For example, say you wanted to see how many times the word "we" occurs in your string:
const csLewisQuote = 'We are what we believe we are.';
const regex = /we/g;

csLewisQuote.match(regex); // ["we", "we"]

In this case, you're matching a lowercase "w" followed by a lowercase "e", which only occurs twice.
If you'd like all instances of the word "we" whether it's upper or lowercase, you have a couple of options.
First, you could use the .toLowercase() method on the string before testing it with the .match() method:
const csLewisQuote = 'We are what we believe we are.'.toLowerCase();
const regex = /we/g;

csLewisQuote.match(regex); // ["we", "we", "we"]

Or if you want to preserve the original case, you could add the case-insensitive search flag (i) to your regular expression:
const csLewisQuote = 'We are what we believe we are.';
const regex = /we/gi;

csLewisQuote.match(regex); // ["We", "we", "we"]

The new .matchAll() method
Now that you know all about the .match() method, it's worth pointing out that the .matchAll() method was recently introduced.
Unlike the .match() method which returns an array or null, .matchAll() requires the global search flag (g), and returns either an iterator or an empty array:
const csLewisQuote = 'We are what we believe we are.';
const regex1 = /we/gi;
const regex2 = /eat/gi;

[...csLewisQuote.matchAll(regex1)]; 
// [
//   ["We", index: 0, input: "We are what we believe we are.", groups: undefined],
//   ["we", index: 12, input: "We are what we believe we are.", groups: undefined]
//   ["we", index: 23, input: "We are what we believe we are.", groups: undefined]
// ]

[...csLewisQuote.matchAll(regex2)]; // []

While it seems like just a more complicated .match() method, the main advantage that .matchAll() offers is that it works better with capture groups.
Here's a simple example:
const csLewisRepeat = "We We are are";
const repeatRegex = /(\w+)\s\1/g;

csLewisRepeat.match(repeatRegex); // ["We We", "are are"]

const csLewisRepeat = "We We are are";
const repeatRegex = /(\w+)\s\1/g;

[...repeatStr.matchAll(repeatRegex)];

// [
//   ["We We", "We", index: 0, input: "We We are are", groups: undefined],
//   ["are are", "are", index: 6, input: "We We are are", groups: undefined],
// ]

While that just barely scratches the surface, keep in mind that it's probably better to use .matchAll() if you're using the g flag and want all the extra information that .match() provides for a single match (index, the original string, and so on).
How to use the .replace() method
So now that you know how to match patterns in strings, you'll probably want to do something useful with those matches.
One of the most common things you'll do once you find a matching pattern is replace that pattern with something else. For example, you might want to replace "paid" in "paidCodeCamp" with "free". Regex would be a good way to do that.
Since .match() and .matchAll() return information about the index for each matching pattern, depending on how you use it, you could use that to do some fancy string manipulation. But there's an easier way – by using the .replace() method.
With .replace(), all you need to do is pass it a string or regular expression you want to match as the first argument, and a string to replace that matched pattern with as the second argument:
const campString = 'paidCodeCamp';
const fCCString1 = campString.replace('paid', 'free');
const fCCString2 = campString.replace(/paid/, 'free');

console.log(campString); // "paidCodeCamp"
console.log(fCCString1); // "freeCodeCamp"
console.log(fCCString2); // "freeCodeCamp"

The best part is that .replace() returns a new string, and the original remains the same.
Similar to the .match() method, .replace() will only replace the first matched pattern it finds unless you use regex with the g flag:
const campString = 'paidCodeCamp is awesome. You should check out paidCodeCamp.';
const fCCString1 = campString.replace('paid', 'free');
const fCCString2 = campString.replace(/paid/g, 'free');

console.log(fCCString1); // "freeCodeCamp is awesome. You should check out paidCodeCamp."
console.log(fCCString2); // "freeCodeCamp is awesome. You should check out freeCodeCamp."

And similar to before, whether you pass a string or a regular expression as the first argument, it's important to remember that the matching pattern is case sensitive:
const campString = 'PaidCodeCamp is awesome. You should check out PaidCodeCamp.';
const fCCString1 = campString.replace('Paid', 'free');
const fCCString2 = campString.replace(/paid/gi, 'free');

console.log(fCCString1); // "freeCodeCamp is awesome. You should check out PaidCodeCamp."
console.log(fCCString2); // "freeCodeCamp is awesome. You should check out freeCodeCamp."

How to use the .replaceAll() method
Just like how .match() has a newer .matchAll() method, .replace() has a newer .replaceAll() method.
The only real difference between .replace() and .replaceAll() is that you need to use the global search flag if you use a regular expression with .replaceAll():
const campString = 'paidCodeCamp is awesome. You should check out paidCodeCamp.';
const fCCString1 = campString.replaceAll('paid', 'free');
const fCCString2 = campString.replaceAll(/paid/g, 'free');

console.log(fCCString1); // "freeCodeCamp is awesome. You should check out freeCodeCamp."
console.log(fCCString2); // "freeCodeCamp is awesome. You should check out freeCodeCamp."

The real benefit with .replaceAll() is that it's a bit more readable, and replaces all matched patterns when you pass it a string as the first argument.
That's it! Now you know the basics of matching and replacing parts of strings with regex and some built-in JS methods. These were pretty simple examples, but I hope it still showed how powerful even a little bit of regex can be.
Was this helpful? How do you use the .match(), .matchAll(), .replace(), and .replaceAll() methods? Let me know over on Twitter.
 


 Learn regular expressions in this free crash course 
Beau Carnes — Thu, 18 Apr 2019 15:25:41 +0000
 Regular expressions, or just regex, are used in almost all programming languages to define a search pattern that can be used to search for things in a string.
You can learn the basics of regular expressions for free in this complete crash course. It focuses on using regex in JavaScript, but the principles apply in many other programming languages you might choose to use.
This course follows along with the free regular expressions curriculum on freeCodeCamp.org.
You can watch the full video course on the freeCodeCamp.org YouTube channel (45 minute watch).

From [https://xkcd.com/208/](https://xkcd.com/208/" rel="nofollow noopener)
 


 Exploring the Linguistics Behind Regular Expressions 
freeCodeCamp — Mon, 20 Nov 2017 15:38:09 +0000
 By Alaina Kafkes
How a linguistic breakthrough ended up in code

_Image Credit: [xkcd](https://xkcd.com/" rel="noopener" target="blank" title=")
Regular expressions inspire fear in new and experienced programmers alike. When I first saw a regular expression — often abbreviated as “regex” — I remember feeling dizzy from looking at the litany of parentheses, asterisks, letters, and numbers. Regular expressions seemed nonsensical, impenetrable.
I expected regular expressions to crop up again in my upper-level computer science coursework — maybe by then I’d finally feel ready to tackle them — but I encountered them in an introductory class that I had put off until my senior year. The purpose of this course was to draw students who had never written a line of code into CS by introducing them to concepts like cryptography, human-computer interaction, machine learning — you know, only the latest and greatest of tech buzzwords.
I didn’t attend more than a handful of lectures, but one of the assignments stuck with me. I had to write an essay about a famous computer scientist or academic whose work impacted computer science. I chose Noam Chomsky.
Little did I know that learning about Chomsky would drag me down a rabbit hole back to regular expressions, and then magically cast regular expressions into something that fascinated me. What enchanted me about regular expressions was the homonymous linguistic concept that powered them.
I hope to spellbind you, too, with the linguistics behind regular expressions, a a backstory unknown to most programmers. Though I won’t teach you how to use regular expressions in any particular programming language, I hope that my linguistic introduction will inspire you to dive deeper into how regular expressions work in your programming language of choice.
To begin, let’s return to Chomsky: what does he have to do with regular expressions? Hell, what does he even have to do with computer science?
A Computer Scientist By Accident
Wikipedia christens Noam Chomsky as a linguist, philosopher, cognitive scientist, historian, social critic, and political activist, but not as a computer scientist. Because he is so highly regarded in all of these fields, his indirect contributions to the field of computer science often fall by the wayside.
The more I researched Chomsky’s academic work, the more accidental Chomsky’s foray into computing seemed. This affirmed my belief that all fields — even those that appear disparate from computer science — have something to offer to computing and the tech industry.
His contributions to the field of linguistics in particular exemplify the impact of interdisciplinary research on computer science. The Chomsky hierarchy transformed the code that computer scientists, software engineers, and hobbyists write today.
Yes, it was this hierarchy that brought regular expressions to computer science. But, before we can understand the jump from Chomsky to regular expressions, I’ll outline the Chomsky hierarchy.
Linguistic Law & Order
The Chomsky hierarchy is an ordering of formal grammars — think syntactic rules for formal languages — such that each grammar exists as a proper subset of the grammars above it in the hierarchy. Some formal languages have stricter grammars than others, so Chomsky sought to organize formal grammars into his eponymous hierarchy.
I briefly mentioned that formal grammars are syntactic rules: rules that give all possible valid phrases for a given formal language. Grammars provide the rules that build languages. In linguist-speak, a language’s formal grammar provides a framework with which nonterminals (input or intermediate string values) can be converted into terminals (output string values).
To elucidate this new vocabulary, I’ll walk through an example of converting a set of nonterminals into terminals using a made-up formal grammar. Let’s say that our pretend formal language, Parseltongue, has the following formal grammar:

Terminals: {s, sh, ss}
Nonterminals: {snake, I, am}
Production rules: {I → sh, am → s, snake → ss}

Using the production rules, I can convert the input sentence “I am snake” into “sh s ss.” This conversion happens piece by piece: “I am snake” → “sh am snake” → “sh s snake” → “sh s ss.”
As my Parseltongue example illustrates, formal grammars parse strings of nonterminals into terminal-only strings — grammatically correct phrases. But formal grammars act not only as generators of a language, but also recognizers of whether a string fits the formal grammar. Whereas the example string “I am a snake” can be fully converted into terminals, the string “I am not a snake” cannot be written in Parseltongue because the nonterminal “not” cannot be translated into a Parseltongue terminal.
To re-emphasize something I stated earlier: formal grammars generate formal languages. That means that, by creating a hierarchy of formal grammars, Chomsky also categorized languages themselves.
With that sobering introduction, let’s look at the four formal grammars in Chomsky’s hierarchy. From most to least strict, they are:

Regular grammars, which retain no past state knowledge from input string to output string
Context-free grammars, which retain only recent state knowledge from input string to output string
Context-sensitive grammars, which keep all past state knowledge from input string to output string
Unrestricted (or recursively enumerable) grammars, which have all state knowledge and thus can create every output string imaginable from a given input string

What is this “state knowledge” that I speak of? Think of knowledge in terms of scope. Regular grammars, for example, have no knowledge of the string’s past states in their “scope” in the process of converting an input string into an output string. This suggests that once the grammar makes an individual conversion of nonterminal to terminal (plus a series of zero or more nonterminals), the grammar “forgets” the previous state of the string.
On the other hand, unrestricted grammars hold onto every possible state of the string-in-translation. Context-free and context-sensitive grammars fall somewhere in the middle.

If you’re looking for a more detailed explanation of the grammars in the Chomsky hierarchy, you’ll have to take a peek at automata theory. I’ll focus on the grammar that brings us back to regular expressions, fittingly called the regular grammar.
On the Regular Expressions
Regular expressions and regular grammars are equivalent. They communicate the same set of syntactic rules, albeit using different formalisms, and both produce the same regular languages.
In linguistics, a regular expression is recursively defined as follows:

The empty set is a regular expression.
The empty string is a regular expression.
For any character x in the input alphabet, x is a regular expression that produces the regular language {x}.
Alternation: If x and y are regular expressions, then x | y is a regular expression. For example, the regular expression 0|1 produces the regular language {0,1}.
Concatenation: If x and y are regular expressions, then x • y is a regular expression. For example, the regular expression 0•1 produces the regular language {01}.
Repetition (also known as Kleene star): If x and y are regular expressions, then x is a regular expression. For example, the regular language `0•1produces the regular language{0, 01, 011, 0111, ...}`, ad infinitum.

A regular grammar is composed of rules like those of Parseltongue. Just as a regular grammar can be utilized to parse an input string into an output string, a regular expression converts strings quite similarly. You can see examples of this parsing for the alternation, concatenation, and repetition operations — or, to use my prior analogy, rules — that regular expressions adopt.
Let’s return to our friend Noam Chomsky for a moment. According to his hierarchy of grammars, regular grammars retain no information about intermediate steps in converting from an input string to an output string. What does this tell us about regular expressions?
The “forgetfulness” of regular grammars implies that translations in one part of the string do not impact how the other nonterminals in the string are translated in future steps. There is no coordination between different parts of the string in the creation of the output string.
Looking at the linguistics behind regular grammars gives us insight into why programmers first brought regular expressions into code. Although I’ve only discussed formal grammars as generators and recognizers of language, the fact that regular grammars convert input string to output string piece by piece makes them pattern-matchers. In programming, regular expressions use production rules to convert an input string — a pattern — into a regular language — a set of strings that match that pattern.
But I would have never written this blog post if programming language creators implemented regular expressions exactly as they are defined in the field of linguistics. Computational regular expressions are a far cry from their linguistic precursor, but the linguistic regular expressions that I covered provide a useful framework for understanding regular expressions in code.
Two Regular Expressions, Both Alike in Dignity
Hereafter, I will use the term regular expression to mean a linguistic regular expression and the term regex to signify a programmatic regular expression. In the wild, both linguistic and programmatic regular expressions are referred to as “regular expressions” even though they are quite different from one another — how confusing!
The difference between regular expressions and regexes stems from how they are used. Regular expressions — or regular grammars — are part of formal language theory, which exists to describe shared elements of natural languages — languages that evolved over time without human premeditation. Linguists use regular expressions for theoretical purposes, like the categorization of formal grammars in the Chomsky hierarchy. Regular expressions help linguists understand the languages that humans speak.
Regexes, on the other hand, are utilized by everyday programmers who want to search for strings that match a given pattern. While regular expressions are theoretical, regexes are pragmatic. Programming languages are formal languages: languages designed by people (here, programmers) for specific purposes. As you might imagine, programming language creators augmented the functionality of regexes in code. Let’s examine these enhancements.
Remember that regular expressions have three operations: alternation, concatenation, and repetition. I’m no regex expert — regexpert? — but all it takes is a peek at the regular expression Wikipedia page to notice that regexes implement more than just three operations.
For example, using POSIX regex syntax, the pattern .ork matches all four-character strings that end with the three characters “ork.” That period is more powerful than simple alternation, concatenation, and repetition, right?
Nope. Truth be told, even the fanciest of regex metacharacters — characters that invoke a regex operation — derive from regular expression operations. Assuming that the twenty-six lowercase letters of the alphabet are the only characters in the regular grammar, the regex pattern .ork could be written using only regular expression operations as [a|b|c|...|z]ork.
Though the sheer volume of metacharacters suggests that regex has a more powerful set of operations than regular expressions themselves, metacharacters are merely shortcuts for various permutations of the operations that define regular expressions. Regex metacharacters provide a programmer-friendly abstraction for common combinations of alternation, concatenation, and repetition.
So far, I’ve portrayed regexes as regular expressions with amazing shortcuts and clear-cut use cases. However, as you may recall from Chomsky’s hierarchy, regular grammars have the strictest rules and no scope. Luckily, regexes have a little more leeway than their linguistic precursor, thereby bestowing them with more practical power.
Breaking the Regular Grammar Rules
Recall that, according to the the Chomsky hierarchy, regular grammars retain no knowledge in converting an input string to an output string. Since regular expressions are equivalent to regular grammars, this means that regular expressions also have no memory of the intermediate states of a string as it changed from input to output. It also means that translating a nonterminal in one part of a regular expressions has no bearing on the translation of a nonterminal in another part of the expression.
For regexes, it’s a different story. Regexes violate this key regular grammar characteristic by supporting the ability to backreference. Backreferencing allows the programmer to parenthetically separate a subsection of a regular expression and refer to it using a metacharacter. To give an example, the pattern (la)\1 matches “lala” by employing the \1 metacharacter to repeat the search for “la.”
Because different parts of the string cannot influence one another in regular expressions, backreferencing gives regexes a lot more power than their predecessor. More importantly, backreferencing facilitates practical uses of regex such as searching for typos in which the same word was accidentally typed twice in a row. Pragmatism gives insight into why regular expressions were tweaked to create regexes in programming.
Another feature that increases the functionality of regex is the ability to alter the greediness of the matching. Different quantifiers — categories of regex patterns — can look similar but match drastically different parts of a string. A greedy quantifier () will attempt to match as much of the string as possible, whereas a reluctant quantifier (?) will try to match the minimum amount of characters in the string. Given the string “abcorgi,” the pattern `.corgiwould match the entire string but the pattern.?corgi` would only match “bcorgi.”
A possessive quantifier (+) also attempts to match as much of the string as possible, but, unlike the greedy quantifier, it will not backtrack to previous characters in the string in order to find the largest possible match. Given the string “abcorgi,” the patterns .*corgi and .+corgi would match the entire string. Though possessive and greedy qualifiers will often produce the same result, possessive qualifiers tend to be more efficient because they avoid backtracking.
Because quantifiers are metacharacters, they can technically be built from alternation, concatenation, and repetition: the three operations of regular expressions. However, quantifiers create a simple abstraction that allows programmers to quickly specify what type of match they would like.
Conclusion & Further Reading
What a journey we’ve undertaken! We learned about Chomsky and his eponymous hierarchy, then dove deeper into regular grammars. From regular grammars, we explored the linguistic definition of a regular expression. Finally, we used the differences between regular expressions and regexes to motivate how programmers use regex today.
Although I trace the history of regular expressions from Chomsky to modern programming languages, this blog post is not the end of the regex story. If you’d like to learn more about linguistic and computational regular expressions, I have some motivating questions for you.

What is automata theory and how does it relate to the Chomsky hierarchy?
How are regex implemented? What are the tradeoffs of various regex algorithms?
When is it appropriate to use regexes instead of built-in string match and manipulation libraries?

I also have a list of resources that I used to study up on the linguistic and computational elements of regular expressions. Happy regex-ing!

Regular-Expressions.info
Wikipedia: Regular Expressions
StackOverflow: Chomsky Hierarchy in plain English
Introduction to Automata Theory, Languages, and Computation by Hopcroft et al.
StackOverflow: Difference between regular expression and grammar in automata
How to Think like a Computer Scientist: Formal and Natural Languages
Oracle’s Java Tutorials: Quantifiers
StackOverflow: Compare regex in programming languages with regular expression from automata/formal language
Quora: How are regular expressions implemented?

Enjoy what you read? Spread the love by liking and sharing this piece. Have thoughts or questions? Reach out to me on Twitter or in the comments below. Thank you Miles Hinson for proofreading this piece!