Recently, I worked on a machine learning project related to renewable energy, which required historical weather forecast data from multiple cities.
Despite intense research, I had a hard time finding the good data source. Most websites restrict the access to only past two weeks of historical data. If you need more, you need to pay. In my case, I needed five years of data — hourly historical forecast, which can be costly.
My requirements are...
1. Free — at least during trial period
No need to provide credit card info.
Flexible to change forecast interval, time periods, locations.
Easy to reproduce and implement in the production phase.
In the end, I decided to use data from World Weather Online. This took me less than two minutes to subscribe free trial premium API — without filling credit card info. (500 free requests/key/day for 60 days, as of 30-May-2019).
You can try out requests in JSON or XML format here. The result is nested JSON which needed a bit pre-processing work before feeding into ML models. Therefore, I wrote some scripts to parse them into pandas DataFrames and save as CSV for further use.
Introducing wwo-hist package
Input: api_key, location_list, start_date, end_date, frequency
Output column names: date_time, maxtempC, mintempC, totalSnow_cm, sunHour, uvIndex, uvIndex, moon_illumination, moonrise, moonset, sunrise, sunset, DewPointC, FeelsLikeC, HeatIndexC, WindChillC, WindGustKmph, cloudcover, humidity, precipMM, pressure, tempC, visibility, winddirDegree, windspeedKmph
Install and import the package:
pip install wwo-hist
# import the package and function from wwo_hist import retrieve_hist_data # set working directory to store output csv file(s) import os os.chdir(".\YOUR_PATH")
Specify input parameters and call retrieve_hist_data(). Please visit my github repo for more info about parameters setup.
This will retrieve 3-hour interval historical weather forecast data for Singapore and California from 11-Dec-2018 to 11-Mar-2019, save output into hist_weather_data variable and CSV files.frequency = 3
FREQUENCY = 3 START_DATE = '11-DEC-2018' END_DATE = '11-MAR-2019' API_KEY = 'YOUR_API_KEY' LOCATION_LIST = ['singapore','california'] hist_weather_data = retrieve_hist_data(API_KEY, LOCATION_LIST, START_DATE, END_DATE, FREQUENCY, location_label = False, export_csv = True, store_df = True)
There you have it! The script detailed is also documented on GitHub.
Thank you for reading. Please give it a try, and let me know your feedback! If you like what I did, consider following me on GitHub, Medium, and Twitter to get more articles and tutorials on your feed.