Recently, I worked on a machine learning project related to renewable energy, which required historical weather forecast data from multiple cities. Despite intense research, I had a hard time finding the good data source. Most websites restrict the access to only past two weeks of historical data. If you need more, you need to pay. In my case, I needed five years of data — hourly historical forecast, which can be costly.


My requirements are...

1. Free — at least during trial period

No need to provide credit card info.

2. Flexible

Flexible to change forecast interval, time periods, locations.

3. Reproducible

Easy to reproduce and implement in the production phase.

In the end, I decided to use data from World Weather Online. This took me less than two minutes to subscribe free trial premium API — without filling credit card info. (500 free requests/key/day for 60 days, as of 30-May-2019).

https://www.worldweatheronline.com/developer/signup.aspx

You can try out requests in JSON or XML format here. The result is nested JSON which needed a bit pre-processing work before feeding into ML models. Therefore, I wrote some scripts to parse them into pandas DataFrames and save as CSV for further use.


Introducing wwo-hist package

This wwo-hist package is used to retrieve and parse historical weather data from World Weather Online into pandas DataFrame and CSV file.

Input: api_key, location_list, start_date, end_date, frequency

Output: location_name.csv

Output column names: date_time, maxtempC, mintempC, totalSnow_cm, sunHour, uvIndex, uvIndex, moon_illumination, moonrise, moonset, sunrise, sunset, DewPointC, FeelsLikeC, HeatIndexC, WindChillC, WindGustKmph, cloudcover, humidity, precipMM, pressure, tempC, visibility, winddirDegree, windspeedKmph

Install and import the package:

pip install wwo-hist
# import the package and function
from wwo_hist import retrieve_hist_data

# set working directory to store output csv file(s)
import os
os.chdir(".\YOUR_PATH")

Example code:

Specify input parameters and call retrieve_hist_data(). Please visit my github repo for more info about parameters setup.

This will retrieve 3-hour interval historical weather forecast data for Singapore and California from 11-Dec-2018 to 11-Mar-2019, save output into hist_weather_data variable and CSV files.frequency = 3

FREQUENCY = 3
START_DATE = '11-DEC-2018'
END_DATE = '11-MAR-2019'
API_KEY = 'YOUR_API_KEY'
LOCATION_LIST = ['singapore','california']

hist_weather_data = retrieve_hist_data(API_KEY,
                                LOCATION_LIST,
                                START_DATE,
                                END_DATE,
                                FREQUENCY,
                                location_label = False,
                                export_csv = True,
                                store_df = True)
This is what you will see in your console.
Result CSV(s) exported to your working directory.
Check the CSV output.

There you have it! The script detailed is also documented on GitHub.


Thank you for reading. Please give it a try, and let me know your feedback! If you like what I did, consider following me on GitHub, Medium, and Twitter to get more articles and tutorials on your feed.