by Felipe Hoffa
Which US cities are really the rainiest? I crunched the data.
Sam Ramji was surprised: His flight to Seattle was delayed because of rain.
Seattle is famous for being a rainy city, so how could their airport not be prepared to deal with rain?
Well, let’s take a look at Seattle’s historical rainfall data to see whether this day was indeed unusual. And while we’re at it, let’s see which cities in the US are the rainiest.
First check: How much has it rained? NOAA tells us they’ve seen 1.64 inches of rain in the last 24 hours, and most of it in the last 12:
Is that a lot? A quick check with BigQuery can show us:
In the last 17 years, it only rained more than this on 1 to 3 days per year. And years like 2000, 2002, 2014, and 2016, it never rained this much.
So yes, that’s a lot of rain for only 12 hours.
The rainiest US cities
Usually people rank the wettest cities by the amount of rain they get in a year. Not me. I want to know how many days of rain each city had. Was Seattle the worst?
These are the top 60 weather stations, by number of rainy days during the past 17 years:
Alaska and Hawaii show up the top with the most rainy days: The Hilo international aiport in Hawaii saw 4,715 rainy days in the last 17 years, while the Cold Bay airport in Alaska saw 4,515. With an average temperature of 73°F (versus 39°F), you know where I would rather be. 😀
The Seattle-Tacoma airport in this ranking shows up #37, with 2,851 rainy days. So not the rainiest places in the US, but definitely in the top 60. Pittsburgh had it worse, with 2,960 (#40). Portland lives up to its reputation: 3,126 days (#26).
So if you are only happy when it rains, now you know where to go.
How I did all this
Getting the wettest days for Seattle:
#standardSQLSELECT year, MAX(prcp) max_year, COUNTIF(prcp>1.64) days_over_164, name FROM `bigquery-public-data.noaa_gsod.gsod20*` aJOIN ( SELECT name, CAST(usaf AS STRING) usaf, CAST(wban AS STRING) wban FROM `bigquery-public-data.noaa_gsod.stations` WHERE name LIKE 'SEATTLE-TACOMA%' AND state='WA') bON a.wban=b.wban AND b.usaf=a.stn WHERE prcp!=99.99GROUP BY year, nameORDER BY year
- I had to CAST() some keys before doing the JOIN. We’ll fix the underlying datasets! Anyway, BigQuery can handle JOINs over arbitrary keys, so that wasn’t a big problem.
- When NOAA doesn’t have the real data for a point, they replace it with a “99.99”. I filtered those rows out, and you should do the same before running any AVG() or similar operations.
- 4.6 inches of rain in 2003? Surely that’s a lot! In fact, it broke all records for Seattle’s rainiest days. This was after they had the driest summer on record.
Getting the rainiest US cities, by number of rainy days:
#standardSQLSELECT name, country, state, COUNTIF(prcp>0) rainy_days_17_years, COUNT(DISTINCT year) years, lat, lonFROM `bigquery-public-data.noaa_gsod.gsod20*` aJOIN ( SELECT name, CAST(usaf AS STRING) usaf, CAST(wban AS STRING) wban, country, state, lat, lon FROM `bigquery-public-data.noaa_gsod.stations` WHERE country='US') bON a.wban=b.wban AND b.usaf=a.stn WHERE prcp!=99.99 GROUP BY name, country, state, lat, lonHAVING years=17ORDER BY rainy_days_17_years DESCLIMIT 60
Want to learn more?
Check Reto Meier’s post on our GCP big data blog, where he shows off a lot of the open datasets we’ve published for the city of NY. For even more fun, find out how to forecast demand with Google BigQuery, public datasets and TensorFlow, thanks to Lakshmanan V.
Here are some other stories I’ve written that make extensive use of BigQuery:
400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?
Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.medium.comWhat countries have more open source developers per capita than the US?
We are going to use GitHub Archive, GHTorrent, population statistics and BigQuery to find the answer.medium.com