In a Relational Database, tables are often related to each other in a way that allows their information to only be written once in the whole database. Then, when you need to analyze the data, you'll need to combine the info from those related tables.

To do this in SQL, you can use JOIN statements. The LEFT JOIN statement is one of the various JOIN statements available. When you use it to join two tables, it keeps all the rows of the first table (the left table), even if there is not a corresponding match on the second table.

You can use JOIN in a SELECT query to join two tables, table_1 and table_2, like this:

SELECT columns
FROM table_1
LEFT OUTER JOIN table_2
ON relation;
SELECT columns
FROM table_1
LEFT JOIN table_2
ON relation;

First you write which columns will be present in the joined table. You can specify to which table the column belongs by prefixing the table name to the column name. This is necessary if some columns have the same name (like table_1.column_1 and table_2.column_1) with SELECT <columns>.

Then you would write the name of the first table as FROM table_1.

After that you'd write the name of the second table as LEFT OUTER JOIN table_2 or LEFT JOIN table_2 (omitting the OUTER keyword).

And at the end you'd write the relation to use to match the rows, for example ON table_1.column_A = table_2.column_B. Often the relation is by id, but it can be with any column.

SQL LEFT JOIN Example

Let's say you have a book database in which you have two tables, one with books, the other with authors. To avoid repeating all the author info for each book, that info is in its own table, and the books have only the author_name column.

book_id title author_name publ_year
1 Uno, nessuno e centomila Luigi Pirandello 1926
2 Il visconte dimezzato Italo Calvino 1952
3 Le tigri di Mompracem Emilio Salgari 1900
4 Il giorno della civetta Leonardo Sciascia 1961
5 A ciascuno il suo Leonardo Sciascia 1966
6 Il fu Mattia Pascial Luigi Pirandello 1904
7 I Malavoglia Giovanni Verga 1881
author_id name year_of_birth place_of_birth trivia
1 Luigi Pirandello 1867 Agrigento Nobel Prize in Literature in 1934
2 Giovanni Verga 1840 Vizzini was Senator of the Kingdom of Italy from 1920 to 1922
3 Italo Svevo 1861 Trieste real name was Aron Hector Schmitz
4 Cesare Pavese 1908 Santo Stefano Belbo NULL
5 Giuseppe Tomasi di Lampedusa 1896 Palermo was prince of Lampedusa from 1934 to 1957

We can join these two tables based on the names of the authors. Using the books table as the left table, you can write the following code to join them:

SELECT books.title AS book_title, books.publ_year, books.author_name, authors.year_of_birth, authors.place_of_birth
   FROM books
   LEFT JOIN authors
   ON books.author_name = authors.name
;

Let's break it down.

In the first line, you choose which columns to show in the final table. It's also the place to decide if some columns will have a different name in the resulting table using AS like with books.title AS book_title.

The second line, FROM books, says which is the first table to consider, also called the left table.

Then the third line, LEFT JOIN authors, says which other table to consider.

ON books.author_name = authors.name says to match the tables using the rows books.author_name and authors.name.

After this query you would get the table as below, where the rows that didn't get info from the authors table just show NULL.

book_title publ_year author_name year_of_birth place_of_birth
Uno, nessuno e centomila 1926 Luigi Pirandello 1867 Agrigento
Il visconte dimezzato 1952 Italo Calvino NULL NULL
Le tigri di Mompracem 1900 Emilio Salgari NULL NULL
Il giorno della civetta 1961 Leonardo Sciascia NULL NULL
A ciascuno il suo 1966 Leonardo Sciascia NULL NULL
Il fu Mattia Pascal 1904 Luigi Pirandello 1867 Agrigento
I Malavoglia 1881 Giovanni Verga 1840 Vizzini

Note that the authors not present in the books table are not in this joined table. This is because, as I said before, only the unrelated rows from the left table (in this case books) are kept, not those from the right/second table.

A more complex LEFT JOIN example

Let's see another way you can use JOIN together with other SQL features to do some data analysis.

You might want to see how many books from each author are present in the database. You could use the below query to do so:

SELECT authors.name AS author_name,
    SUM(
      CASE
        WHEN books.title LIKE '%'
          THEN 1
        ELSE 0
      END
    ) as number_of_books
  FROM authors
  LEFT JOIN books
  ON books.author_name = authors.name
  GROUP BY authors.name
  ORDER BY number_of_books DESC
;

Code breakdown

Line 1: with SELECT you list the columns you want in the resulting table.

Line 2: SUM is an aggregation function used in conjunction with GROUP BY. The values of the rows that are grouped together are then summed.

Line 3-7: you use the CASE statement to get different results depending on a condition. In this case, a row is counted as 1 if it contains a book title, otherwise it is counted as 0. And here we use LIKE to check if the cell contains any characters (learn more in this article about Contains String).

Line 8: this gives a name of number_of_books to the column that is created for the SUM.

Line 9: the left/first table in this case is authors.

Line 10: the right/second table in this case is books.

Line 11: this joins the two tables using the author names.

Line 12: the rows are grouped by author name - all the rows with the same value in that column will be represented by a single row.

Line 13: we use order by to arrange in descending order using the number of books.

The query will give you the below table. Note that you see here only the authors that are present in the authors table. The authors mentioned in the books table without an entry in the authors table are not present here. This is an effect of the fact that the unrelated rows from the books table were not kept.

author_name number_of_books
Luigi Pirandello 2
Giovanni Verga 1
Cesare Pavese 0
Giuseppe Tomasi di Lampedusa 0
Italo Svevo 0

If the authors table is updated to include all the authors mentioned in the books table, like this:

author_id name year_of_birth place_of_birth trivia
1 Luigi Pirandello 1867 Agrigento Nobel Prize in Literature in 1934
2 Giovanni Verga 1840 Vizzini was Senator of the Kingdom of Italy from 1920 to 1922
3 Italo Svevo 1861 Trieste real name was Aron Hector Schmitz
4 Cesare Pavese 1908 Santo Stefano Belbo NULL
5 Giuseppe Tomasi di Lampedusa 1896 Palermo was prince of Lampedusa from 1934 to 1957
6 Italo Calvino 1923 Santiago de las Vegas NULL
7 Emilio Salgari 1862 Verona NULL
8 Leonardo Sciascia 1921 Racalmuto NULL

Then the table from the query above would actually give the number of books for all authors.

author_name number_of_books
Leonardo Sciascia 2
Luigi Pirandello 2
Emilio Salgari 1
Giovanni Verga 1
Giovanni Verga 1
Cesare Pavese 0
Giuseppe Tomasi di Lampedusa 0
Italo Svevo 0

Conclusion

In a Relational Database, data should be written only once, so we often end up with multiple tables related to each other. LEFT JOIN is a really useful ally when we need to analyse data and join information from different tables. Enjoy querying your database using this powerful tool.