In your day to day job as a Software Engineer or Database Administrator, you'll likely have to write long complex queries, often with some subqueries.
These queries over time become less performant, difficult to read and understand, and even more difficult to manage. And no one wants to do the hard job of refactoring them, so they just live on.
Or you have probably had to fetch similar data based on a set of data or parameters. To achieve this, you write many similar, sometimes exactly identical subqueries and bring them together using the UNION keyword.
Well, you can make your life easier and solve these problems efficiently using a Common Table Expression.
A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times. – MySQL.com
Using a Common Table Expression, you can write more readable and more performant queries very easily. It's actually easier than it is to write multiple subqueries that could make your queries unreadable and less performant.
You'll primarily use a common table expression for two reasons:
- To write queries without using subqueries (or using fewer subqueries)
- To write recursive functions
In this tutorial, I'll show you how to write your own common table expressions.
How to Create a Common Table Expression
You can create a Common Table Expression (CTE) using the
WITH keyword. You can specify multiple common table expressions at the same time by comma-separating the queries making up each common table expression.
The general shape of a Common Table Expression is like so:
WITH keyword is followed by the CTE name. After the name, you introduce the query to be run in the CTE using the
AS keyword. You need to enclose the query must in parentheses. The CTE cannot be followed by a semicolon like other SQL queries. Instead it is followed by another query that uses it.
After creating a CTE, you can easily use the result of the queries run in the CTE by referencing the CTE in other queries, other CTEs, or even in itself.
If you have a table of world_cup players, for example, you can create a CTE like this:
WITH barca_players AS ( SELECT id, player_name, nationality, position, TIMESTAMPDIFF (YEAR, player_dob, CURRENT_DATE) age FROM wc_players WHERE club = 'Barcelona' ) SELECT * FROM barca_players;
Here, we've created a CTE named
barca_players. This CTE will return the name, position, age, and nationality of every Barcelona player that was at the world cup. It contains the subquery:
SELECT id, player_name, nationality, position, TIMESTAMPDIFF (YEAR, player_dob, CURRENT_DATE) age FROM wc_players WHERE club = 'Barcelona';
This subquery is what produces the CTE result. Next, it is followed by a query that uses this result. You can see the result of selecting every record in the CTE below.
You can also select only specific fields from the CTE, for example:
WITH barca_players AS ( SELECT id, player_name, nationality, position, TIMESTAMPDIFF (YEAR, player_dob, CURRENT_DATE) age FROM wc_players WHERE club = 'Barcelona' ) SELECT player_name, position FROM barca_players;
This query is almost the same as the first one, except that it selects only the player names and positions from the list.
How to Use Common Table Expressions With Parameters
You can also pass arguments to the CTE. These are aliases you can use for referencing columns of the query results. The number of parameters passed into the CTE must be the same as the number of columns being selected in its subquery. This is because the columns get matched to the aliases one by one, one after the other.
For example, in the
barca_players CTE created above, you can decide to refer to the
nationality column as
WITH barca_players (id, player_name, country, role, age) AS ( SELECT id, player_name, nationality, position, TIMESTAMPDIFF (YEAR, player_dob, CURRENT_DATE) age FROM wc_players WHERE club = 'Barcelona' ) SELECT player_name, role FROM barca_players;
Notice that in the CTE subquery, you still use the correct column names. But in the outer
SELECT query, you use the new aliases specified as parameters to the CTE.
Recursive Common Table Expressions
When you reference a Common Table Expression within itself, it becomes a recursive Common Table Expression.
A Recursive Common Table Expression, as the name implies, is a common table expression that can run a subquery multiple times, as long as a condition is met. It iterates continuously until it reaches a break point, when the condition stops being true.
To define a recursive CTE, the
RECURSIVE keyword must be in its name. Without this keyword, MySQL throws an error.
For example, you can write a common table expression that prints numbers 1 to 10 and their squares like this:
WITH RECURSIVE numbers_list (n, square) AS ( SELECT 1, 1 UNION ALL SELECT n + 1, (n + 1) * (n + 1) FROM numbers_list WHERE n < 10 ) SELECT * FROM numbers_list;
Let's examine what is happening here:
In the first two lines, the recursive common table expression is defined with two parameters, one representing the column for the number, and the other representing the column for the square:
WITH RECURSIVE numbers_list (n, square) AS (
Next, the subquery. The subquery is in two parts, joined by the
UNION ALL keyword to form one. You can also join these subqueries by the
UNION keyword if you don't need duplicate records.
The first part of the subquery is a key part of recursive common table expressions. It is the base query, the first result set, the initial iteration. This query is the starting point of all iterations.
In this example, it is static, as no records are being fetched.
SELECT 1, 1
After this first query, the result table has one row, and looks like this:
The second part of the subquery is where the iteration really happens.
In this query, the CTE is referenced within itself, and its columns can be used. When a column name is mentioned, the most recent value of that column is taken.
So at the start of the iteration,
n is 1 and
square is also 1. That means,
n + 1 is 2, and
(n + 1) * (n + 1) is 2 *2 which is 4. 2 and 4 get added to the result table and then become the most recent values in the table.
n becomes 2, and
This continues until the condition in the
WHERE keyword is stops being true.
WHERE keyword in the query specifies the breakpoint of the CTE. Until the condition specified is met, the query keeps getting run. In this case, after every iteration, the query checks if
n is less than 10.
If a condition that will always evaluate to true is set, then this creates an endless loop and you get an error like
Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.
SELECT n + 1, (n + 1) * (n + 1) FROM numbers_list WHERE n < 10
Now you might think, "If the condition checks for
n < 10 , how come 10 is still in the final table?".
Well, the reason is because in SQL, the
WHERE keyword part of a query is evaluated first before other parts. So, when
n = 9 is the last row, the query runs once more, and before insertion or anything, it checks if 9 is less than 10. Since 9 is less than 10, it adds
n + 1 which is 10 to the list. Then on the next iteration, 10 is the most recent record and it is not less than itself, so the loop ends.
Keep in mind that a Recursive Common Table Expression consists of a recursive
SELECT query, and a non-recursive
Simple Recursive Common Table Expression Rules
- You can't use the
GROUP BYkeyword. This is because you can only group a collection, but in a recursive common table expression, records are handled and evaluated individually. Other keywords like
DISTINCT, and aggregate functions like
SUMcannot be used either.
- You can't use window functions.
These rules apply to the recursive part of a recursive common table expression.
Use Cases for Recursive CTEs
The Fibonacci sequence is a sequence in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, although some authors start the sequence from 1 and 1 or sometimes from 1 and 2. (source)
You can easily generate a Fibonacci sequence of any length using a recursive common table expression. For example, here's a query that will get the first 20 numbers of a Fibonacci sequence starting from 0 and 1.
WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS ( /* * n - Number of iterations * fib_n - Currennt Fibonnaci number. Starts at 0 * next_fib_n - Next Fibonnaci number. Starts at 1 */ SELECT 1, 0, 1 UNION ALL SELECT n + 1, next_fib_n, fib_n + next_fib_n FROM fibonacci WHERE n < 20 ) SELECT * FROM fibonacci;
Hierarchical Data Traversal
In many application databases, you will find that hierarchical data is stored in the same table.
For example, a
categories table will usually contain main categories and sub-categories referencing their parent category. An
employees table will contain regular employees with their
manager_id, as well as their managers or supervisors, because they are also employees.
If you had a
categories table like this, with 4 records, 1 main category, and a chain of sub-categories:
CREATE TABLE categories ( id int, cat_name varchar(100), parent_category_id int DEFAULT NULL ); INSERT INTO categories VALUES (1, 'Mens', NULL), (2, 'Tops', 1), (3, 'Jerseys', 2), (4, 'England', 3);
You can fetch each category, with its parent category attached easily like this:
WITH RECURSIVE category_tree AS ( SELECT id, cat_name, parent_category_id, cat_name AS full_name FROM categories WHERE parent_category_id IS NULL UNION ALL SELECT c.id, c.cat_name, c.parent_category_id, CONCAT (ct.full_name, ' > ', c.cat_name) FROM categories c JOIN category_tree ct ON c.parent_category_id = ct.id ) SELECT full_name FROM category_tree;
In this example, the base query selects the root category, where
parent_category_id IS NULL. Then it goes on to look for a category where the
parent_category_id is the
id of the current category by using a
JOIN. It repeats this until it gets to the final category. The result of this query is the following:
I hope you now understand how to use MySQL Common Table Expressions, their variations (regular and recursive), and when to use them so you can write better queries. You can find more about common table expressions in the docs here.
If you have any questions or relevant advice, please get in touch with me to share them.
To read more of my articles or follow my work, you can connect with me on LinkedIn, Twitter, and Github. It’s quick, it’s easy, and it’s free!