by Felipe Hoffa

The 2018 World Cup Visualized: All the Goals So Far

Check out all the goals so far with this Data Studio interactive visualization. Data extracted using BigQuery. keep reading to learn more.

Each dot shows where the player was when they scored.

On the left: Own-goals (when a team accidentally kicks the ball into their own goal) are attributed to the team that auto-inflicted them.

On the right: Interactive controls to visualize only the teams you care about.

Check out Eric Schmidt’s Analyzing the World Cup using Google Cloud series of posts for even more info on this.

And a follow up, also complete with an interactive dashboard to check the most valuable players so far:

World Cup visualized: The most valuable players
Check out the players that have touched the ball most often with this interactive dashboard. Data from Opta Sports…medium.com

Counter fact-checking

Reddit wasn’t sure about these results. So I counter-fact-checked them.

All the Brazilian goals

All the Brazilian goals, first round. Pretty good accuracy, IMHO

How I Built this Visualization

Data acquisition

This chart is powered with the same source we used to predict the 2014 World Cup results. We can’t re-share the raw data, but contact Opta Sports if you’d like to license the feed from them.

Extracting the goals

Note that:

  • Joins to get the team names out of team ids were a little tricky.
  • Own-goals are attributed to the team that dealt the self-damage, not to the ones that the goals count for.

Here’s the query:

#standardSQL
WITH team_names AS (  SELECT team_id, MIN(name) name  FROM (    SELECT away_team_id team_id, away_team_name name    FROM `cloude-sandbox.galacticos.games`    WHERE competition_id = 4    UNION ALL    SELECT home_team_id team_id, home_team_name name    FROM `cloude-sandbox.galacticos.games`    WHERE competition_id = 4  )   GROUP BY 1)
SELECT a.* EXCEPT(game_id, player_id), b.name versus  , (SELECT MAX(name) FROM `cloude-sandbox.galacticos.sqauds` WHERE player_id = a.player_id) playerFROM (SELECT id, DATE(event_timestamp) date, x, y, min minute  , (SELECT name FROM team_names WHERE team_id=a.team_id) team  , game_id, CONCAT('p', CAST(player_id AS STRING)) player_idFROM `cloude-sandbox.galacticos.events` aWHERE competition_id = 4AND type_id IN (16)AND DATE(event_timestamp) > '2018-01-01') aJOIN (  SELECT id, home_team_name name FROM `cloude-sandbox.galacticos.games` UNION ALL SELECT id,  away_team_name name FROM `cloude-sandbox.galacticos.games` ) bON a.game_id = b.idAND a.team != b.nameORDER BY 1, 2

Scheduling result extraction to a Google sheet

I’m using a modified Ani Lopez’s easy BigQuery scheduling sheet. Instead of saving the results to a query, I place them in the same sheet.

Connect Data Studio to the sheet

Instead of having Data Studio read results from BigQuery, I’ll point Data Studio to the previous step sheet.

Visualize

That’s the Data Studio magic :).

And for a more comprehensive Data Studio dashboard covering the cup, check Will Sherman’s.

Refresh

New goals should show up as the dataset gets updated and the query in that sheet runs.

Next steps

Check out how Max Woolf did a heatmap with all the NCAA’s shots (with data in BigQuery too).

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

World Cup visualized: The most valuable players
Check out the players that have touched the ball most often with this interactive dashboard. Data from Opta Sports…medium.comMaking World Cup Sausage with Cloud Dataflow and BigQuery
The 2018 World Cup is finally here. In the opening match, Saudi Arabia will take the pitch against the host country…medium.com