Saving relevant data from external APIs to a database to reduce API calls?

yuh · December 2, 2019, 12:38am

I’m building an app that relies on data from the Spotify API including artist, albums and tracks. It’s basically a way to get a list of related artists up to an nth-level deep instead of having to do it manually when I’m streaming.

Because the core of the app builds on that data, I’ve been thinking about storing the relevant pieces I need in my own database (such as the Spotify IDs for faster look up of resources, or info about songs such as name, length, explicit, etc.) I’m starting with about 10 artists but as the app continues to be used, more artists will be retrieved. So I’m ending up with potentially hundreds of different artists/albums/tracks that I’d need to fetch data on.

Fetching once and storing the bits I need means I could check my own database to see if I searched for an artist/album/track before, and if so I’d use that data right away. If nothing is in my storage, that’s when I’d make the calls to the Spotify API and process the responses.

My app is non-commercial but I’m worried about violating the terms and conditions. Particularly this:

Do not improperly access, alter or store the Spotify Service or Spotify Content, including (i) using any robot, spider, site search/retrieval application, or other tool to retrieve, duplicate, or index any portion of the Spotify Service or Spotify Content (which includes playlist data) or collect information about Spotify users for any unauthorized purpose; (ii) making excessive service calls that are not strictly required for the proper functioning of your SDA via the Spotify Platform; (iii) storing metadata or cover art or aggregating metadata, cover art, audio, or other Spotify Content to create databases or any other compilation other than as strictly necessary to offer and operate your SDA;

So I just wanted to know how professionals do it in the real world. If your app/service relies on an external API to function, how do you handle that data and possibly needing to fetch the same things many times? What tools or techniques do you suggest I look into?

snigo · December 2, 2019, 1:21am

Correct me if I’m wrong, you will have your API that accepts Spotify URI of an artist and instead of < 10 produces > 100 relative artists. I would argue that URI cannot be considered as metadata or content, so if you store a map like

artistURI: artistURI[]

you are not violating anything, as this array was produces by your app and not given to you by Spotify
Anything beyond would be content and metadata + you don’t really want to keep it as you want a single source of truth

yuh · December 2, 2019, 1:51am

I’m using an endpoint to fetch 20 of an artist’s related artists. Once I get that data, I process it further and then build a graph. Spotify doesn’t provide more than 20 artists IIRC, and there’s no way to specify how many levels deep.

So, for example, if my API gets a search term “Lorde”, it would first check to see if Lorde had been processed before. If not, it’ll call Spotify’s API to get Lorde’s Spotify ID (and save it to the DB), then fetch her related artists (and their related artists and their related artists… up to n-levels deep) and build the graph. So what I end up with in the database is similar to what you described. This way, later on, if I want to listen to random songs based on Lorde’s related artists, my app can generate a list of songs based on these connections. This is pretty much what I’m doing manually so I wanted to automate that process and make it less tedious.

I’m thinking that things could slow down very fast if I have to go down many levels every time. If I only did Lorde up to 2 levels the first time for example, and then someone else wanted to go up to 5, all I’d have to do then is get Lorde’s level 2 artists and start from there instead of having to start all the way at the top with her again.

To keep the data in my DB relatively updated, I was planning on having a background job that pings the API once every x days to check and update my data if needed. But maybe this is a poor solution. Would you recommend simply storing the Spotify ID and name of the resources (artists, albums, songs) and fetch everything else on a request by request basis?

snigo · December 2, 2019, 2:15am

I wouldn’t rely on depth as many artist can reference each other in circular fashion, so solely depth would give very different results. I would rather use limit and go ahead with breadth-first search until limit reached - this looks more consistent. And finally I wouldn’t suggest to crank up the limit as there’s a thing called “Six degrees of separation” and beyond 3-4 levels you might go quite far from the truth

As per data structure, graph sounds about right, because A -> B relation doesn’t mean B -> A, so you need to go 2-D

yuh · December 2, 2019, 4:28am

Breadth-first search is something I’m actually looking into to help me deal with the circular relationships. A limit is a good idea, with the option of going deeper available if need be. I now have a better idea on how to design the app, thanks for your help @snigo.