FCC Forum Trending Keywords - Sneak peek

Hello fellow campers.

For TL;DR see below.

I’ve been working the past week or so on an API that scrapes (more of an API call really) the FCC Forums topic titles and analyzes them for trending keywords. Don’t worry, cleared everything with Quincy before attempting!

Anyway, the basic premise is that it pulls the data for a day or week timespan, counts the keywords frequency (ignoring stop words), and compares the frequency to historic frequencies, using a Z-Score calculation. There are some obvious issues with this at the moment:

  • I’ve only been collecting data for roughly a week, so only daily trends are somewhat accurate.
  • If the previous day/week span has absolutely 0 mentions of a keyword, this artificially inflates the score to Infinity. For now I’m ignoring them (unless someone can help me with how to handle).
  • Similarly, if a previous day/week span has very few mentions (say 1) and the current period has slightly more (say 2), this will inflate the score, even though that keyword is obviously not that popular.
  • Stop word list may need to be increased, as you campers are just too polite! Please showed up as #6 on my first run.
  • Longest comparison able is a week in the past. Need to continue to collect data to be more accurate.

Hopefully these issues will even out with time, but if anyone knows stats well, I’d love to work on this with you! Anyway, here is the source code (please ignore the mess as it’s still under construction).

Let me know what you think, and please feel free to shoot suggestions and critiques! I’ll be updating once I make everything live.

TL;DR: Made a program that scrapes forum and analyzes keywords for trending topics. Suggestions welcome.

Current trending daily keywords

Also, does anyone know any good free hosting? My Heroku slots are all filled.

Openshift looks promising as it supports node and had built in MongoDB support and storage space. Any other suggestions would be appreciated though.

Do you have a credit card to verify your account? I thought you could have unlimited apps once you verified? (With no charge)

Ah, possibly. I didn’t know that you could do that. It would still have the several hours downtime though correct?

Well, heroku apps kinda go to sleep, but they wake up automatically when they are pinged by a user.

If yours relies on frequent automated scraping, that might not work - but you could probably work around it by having the scraping tied to a GET request to the homepage.