Hello fellow campers.
For TL;DR see below.
I’ve been working the past week or so on an API that scrapes (more of an API call really) the FCC Forums topic titles and analyzes them for trending keywords. Don’t worry, cleared everything with Quincy before attempting!
Anyway, the basic premise is that it pulls the data for a day or week timespan, counts the keywords frequency (ignoring stop words), and compares the frequency to historic frequencies, using a Z-Score calculation. There are some obvious issues with this at the moment:
- I’ve only been collecting data for roughly a week, so only daily trends are somewhat accurate.
- If the previous day/week span has absolutely 0 mentions of a keyword, this artificially inflates the score to Infinity. For now I’m ignoring them (unless someone can help me with how to handle).
- Similarly, if a previous day/week span has very few mentions (say 1) and the current period has slightly more (say 2), this will inflate the score, even though that keyword is obviously not that popular.
- Stop word list may need to be increased, as you campers are just too polite! Please showed up as #6 on my first run.
- Longest comparison able is a week in the past. Need to continue to collect data to be more accurate.
Hopefully these issues will even out with time, but if anyone knows stats well, I’d love to work on this with you! Anyway, here is the source code (please ignore the mess as it’s still under construction).
Let me know what you think, and please feel free to shoot suggestions and critiques! I’ll be updating once I make everything live.
TL;DR: Made a program that scrapes forum and analyzes keywords for trending topics. Suggestions welcome.