by Andrei Chernikov
Simple RegEx tricks for beginners
Always wanted to learn Regular Expressions but got put off by their complexity? In this article, I will show you five easy-to-learn RegEx tricks which you can start using immediately in your favorite text editor.
Text Editor Setup
While almost any text editor supports Regular Expressions now, I will use Visual Studio Code for this tutorial, but you can use any editor you like. Also, note that you usually need to turn on RegEx somewhere near the search input. Here is how you do this in VS Code:
. — Match Any Character
Let’s start simple. The dot symbol
. matches any character:
Above RegEx matches
"bat” and any other word of three characters which starts with
b and ends in
t. But if you want to search for the dot symbol, you need to escape it with
\, so this RegEx will only match the exact text
2) .* — Match Anything
. means “any character” and
* means “anything before this symbol repeated any number of times.” Together (
loadScript(scriptName: string, pathToFile: string)
And we want to find all calls of this method where
pathToFile points to any file in the folder
“lua” . You can use the following Regular Expression for this:
Which means, “match all text starting with
“loadScript” followed by anything up to the last occurrence of
3) ? — Non-Greedy Match
? symbol after
.* and some other RegEx sequences means “match as little as possible.” If you look at the previous picture, you will see that text
“lua” is seen twice in every match, and everything up to the second
“lua” was matched. If you wanted to match everything up to the first occurrence of
"lua" instead, you would use the following RegEx:
Which means, “match everything starting with
"loadScript" followed by anything up to the first occurrence of
4) ( ) $ — Capture Groups and Backreferences
Okay, now we can match some text. But what if we want to change parts of the text we found? We often have to make use of capture groups for that.
Let’s suppose we changed our
loadScript method and now it suddenly needs another argument inserted between its two arguments. Let’s name this new argument
id, so the new function signature should look like this:
loadScript(scriptName, id, pathToFile). We can’t use normal replace feature of our text editor here, but a Regular Expression is exactly what we need.
Above you can see the result of running the following Regular Expression:
Which means: “match everything starting with
"loadScript(" followed by anything up to the first
,, then followed by anything up to the first
The only things which might seem strange here for you are the
\ symbols. They are used to escape brackets.
We need to escape symbols
) because they are special characters used by RegEx to capture parts of the matched text. But we need to match actual bracket characters.
In the previous RegEx, we defined two arguments of our method call with the
.*? symbols. Let’s make each of our arguments a separate capture group by adding
) symbols around them:
If you run this RegEx, you will see that nothing changed. This is because it matches the same text. But now we can refer to the first argument as
$1 and to the second argument as
$2. This is called backreference, and it will help us do what we want: add another argument in the middle of the call:
Which means the same thing as the previous RegEx but maps arguments to capture groups 1 and 2 respectively.
Which means “replace every matched text with text
“loadScript(“ followed by capture group 1,
“id”, capture group 2 and
)”. Note that you do not need to escape brackets in the replace input.
5) [ ] — Character Classes
You can list characters you want to match at a specific position by placing
] symbols around these characters. For example, class
[0-9] matches all digits from 0 to 9. You can also list all digits explicitly:
 — the meaning is the same. You can use dash with letters too,
[a-z] will match any lowercase Latin character,
[A-Z] will match any uppercase Latin character and
[a-zA-Z] will match both.
You can also use
* after a character class just like after
., which in this case means: “match any number of occurrences of the characters in this class”
I urge you to open your text editor and start using some of these tricks right now. You will see that you can now complete many refactoring tasks much faster than before. Once you are comfortable with these tricks, you can start researching more into regular expressions.