recommender-systems - freeCodeCamp.org

What is the Cold Start Problem in Recommender Systems?

Praise James — Tue, 25 Feb 2025 21:09:01 +0000

Recommender systems are used to provide personalized experiences for customers in many industries today, including e-commerce, social media, entertainment, and education. These recommender systems make recommendations based on user preferences and collect user feedback to optimize their performances.

Netflix uses your watch history and the preferences of similar viewers to determine what you might like to watch next. That's why Netflix nudges you to watch Prison Break after completing Money Heist.

According to Statista, by the end of 2024, Netflix had over 300 million paid subscribers, so its recommender system has significant user data to work with. Hence, the intelligent movie recommendations.

But a platform with a newly implemented recommender system that has insufficient information to interact with will face what is known as the cold start problem. This means that the platform will be unable to efficiently and accurately recommend products or services that meet the needs of its users.

In this article, you'll learn about the cold start problem in recommender systems, its types, why it occurs, and how it can be mitigated.

What is the Cold Start Problem?

The cold start problem in recommender systems occurs when there is little or no historical data to draw inferences from. This means the recommender system cannot accurately provide relevant suggestions to users when initially implemented on a new platform since it takes time to gather data and draw insights from it.

Typically, recommender systems gather data such as product interactions, purchases, reviews, and so on, depending on the business' key data points. This data is called the reference characteristics of the system. The system trains on this data to provide intelligent suggestions that will compel users to continue using the platform.

For example, Spotify's recommender system can analyze your listening history and play frequency to understand your past music preferences and predict what you might like to listen to next.

The cold start problem is a popular problem that data scientists and machine learning (ML) engineers face when building recommender systems for a business. The performance of the recommender system drops when there is no sufficient data to gather from new users or items on the platform. Sadly, this low performance can turn users away and lead to revenue loss.

Types of Cold Start Problems in Recommender Systems

There are two main types of cold start problems in recommender systems: the user cold start and the item cold start. To illustrate these problems, I created a table representing a recommendation system based on user-item ratings. In this table:

Rows represent users
Columns represent items
Matrix represents users' ratings on a scale of 1 to 5.
"--" represents unrated items.

User Cold Start

User cold start problem occurs when new users have not provided enough basic information or past interactions for the recommender system to make intelligent suggestions. Thus, the system cannot accurately predict the user's possible interests.

As shown in the table above, the NEW USER has not used or evaluated any item. This means that the system cannot accurately predict what item the new user would most likely be interested in.

This is a serious problem because if new users keep getting off-target recommendations initially, they might stop using the platform before the recommender system has enough data from them to perform better.

Item Cold Start

The item cold start problem occurs when a new item or product, or more content, is added to a platform, but there are not enough ratings, purchases, or reviews for the item to be recommended.

As shown in the table above, item E is new and does not have any user ratings. Thus, the recommender system will not recommend this item to users.

Note that the item cold start problem does not only affect new items. It can also impact already existing but unpopular items. If an existing item has only had a few interactions, the recommender system does not have enough historical user feedback to understand the item metadata and user preferences. This means that the system will make poor recommendations, giving the item less visibility.

Challenges of the Cold Start Problem in Recommender Systems

Stereotypical recommendations

Relying on limited data can lead to stereotypical recommendations in recommender systems. For example, when the system only uses basic user actions, it can end up offering the same type of content or items repeatedly, based on generalized assumptions. This stereotype can push users away, especially if they start feeling like their interests are not being fully understood.

High churn rate

When a user has to scroll endlessly to find items they want because the recommender system does not surface relevant items, the platform is more prone to experiencing a high churn rate. This means that the platform may lose many of its users if they cannot quickly find relevant products or services.

Loss of customer loyalty

Statista reported that in 2023, 56% of consumers preferred to return to a retailer who provided a personalized shopping experience. This means that a lack of a customized experience through intelligent recommendations will make users mistrust the system's ability to understand their needs. This mistrust can lead to user frustration, loss of customer loyalty, and, ultimately, negative brand perception.

How to Solve the Cold Start Problem in Recommender Systems

Below are some strategies that AI researchers have proposed to help mitigate user cold start and item cold start problems, respectively:

How to Solve User Cold Start

One way to fix the lack of historical data from users is to provide a questionnaire upfront when new users register for a platform. This questionnaire can help businesses obtain some basic preferences so they can build a helpful user profile and make initial recommendations.

Spotify uses this method to avoid user cold start problems. When you sign up, Spotify will ask you to select your favorite artists and music genres from a list of options. Spotify’s recommender system then uses this information to understand the type of songs you might like and build an initial playlist for you.

Still, businesses need to implement this onboarding strategy carefully, because if they ask new users too many questions upon registration, they might skip the questions or abandon the platform.

Using contextual data

Contextual data focuses on information like user location, demographics, device type, or real-time behavior. Businesses can obtain this data through the user’s sign-up information, IP Address, cookies, and browser settings. These extra insights can help businesses enhance the experience for new users by customizing their content and recommendations.

Travel platforms like Booking.com use this strategy to provide personalized recommendations and display localized content to new users. When first-time visitors access the site, Booking.com obtains contextual data through their IP Address, browser settings, and cookies. Using this information, Booking.com can recommend nearby accommodations, attractions, and travel deals in the user's area. In the screenshot above, I have not signed in or registered on Booking.com, but the site already recommends content for my location.

Businesses can solve the user cold start problem by having new users register with their social logins. With this access, the recommender system can retrieve the user's interests, past interactions, and behavior from their social profiles. This information helps the system to understand the user's preferences and make suggestions accordingly.

How to Solve Item Cold Start

Leveraging content-based filtering

Content-based filtering is a recommendation technique that uses the characteristics or metadata of an item, such as features, genres, categories, or descriptions, to make recommendations.

By analyzing the item's information, the recommender system can still suggest new or unpopular items to users even though the items have little reviews or interactions.

Note that content-based filtering can suffer poor recommendation quality when there are insufficient item characteristics. So, a business should only leverage this method if there is detailed information on the items.

Using hybrid filtering

Hybrid filtering involves combining the advantages of content-based filtering and collaborative filtering. Collaborative filtering is a recommendation technique that predicts a user's preferences based on the behavior of similar users. It analyzes data such as browsing history, purchase history, and item ratings to identify users with similar interests. Then, the system suggests items those users have liked to new users.

You have likely seen this technique in action through features like "People Also Liked" or "People Also Searched." Beyond the user-based recommendations, collaborative filtering also suggests items similar to those a user has previously engaged with.

The hybrid filtering approach alternates between the content-based filtering we have discussed above and collaborative filtering when one lacks more data than the other. For example, Amazon might recommend items based on product descriptions and categories (content-based filtering). Then, after some purchase history, the recommender system might suggest products based on what users with similar shopping habits bought (collaborative filtering).

Showing new releases on the homepage

Promoting new items or content on the homepage can provide visibility and encourage users to interact with the item. For maximum impact, it is best to highlight the new item in a visible section on the homepage and clearly label the item as new. This way, customers will not miss the update and are more inclined to try it out.

Conclusion

In this article, you have learned that the cold start problem is one of the key challenges that recommender systems face. Tackling this issue requires a combined approach of data analysis and continuous improvement. By applying the strategies discussed above, businesses can improve their recommender systems and offer more relevant, personalized experiences.

If you found this article helpful, share it with others who may find it interesting.

Stay updated with my projects by following me on LinkedIn and YouTube.

Singular Value Decomposition vs. Matrix Factorization in Recommender Systems

freeCodeCamp — Fri, 26 Apr 2019 21:57:01 +0000

By K. Delphino

Recently, after watching the Recommender Systems class of Prof. Andrew Ng’s Machine Learning course, I found myself very discomforted not understanding how Matrix Factorization works.

I know sometimes the math in Machine Learning is very obscure. It’s better if we think about it as a black box, but that model was very “magical” for my standards.

In such situations, I usually try to search on Google for more references to better grasp the concept. This time I got even more confused. While Prof. Ng called the algorithm as (Low Factor) Matrix Factorization, I found a different nomenclature on the internet: Singular Value Decomposition.

What confused me the most was that Singular Value Decomposition was very different from what Prof. Ng had taught. People kept suggesting they were both the same thing.

In this text, I will summarize my findings and try to clear up some of the confusion those terms can cause.

Recommender Systems

Recommender Systems (RS) are just automated ways to recommend something to someone. Such systems are broadly used by e-commerce companies, streaming services and news websites. It helps to reduce the friction of users when trying to find something they like.

RS are definitely not a new thing: they have been featured since at least 1990. In fact, part of the recent Machine Learning hype can be attributed to interest in RS. In 2006, Netflix made a splash when they sponsored a competition to find the best RS for their movies. As we will see soon, that event is related to the nomenclature mess that followed.

The matrix representation

There are a lot of ways a person can think of recommending a movie to someone. One strategy that turned out to be very good is treating movie ratings as a Users x Movies matrix like this:

Created with https://sheetsu.com/

In that matrix, the question marks represent the movies a user has not rated. The strategy then is to predict those ratings somehow and recommend to users the movies they will probably like.

Matrix Factorization

A really smart realization made by the guys who entered the Netflix’s competition (notably Simon Funk) was that the users’ ratings weren’t just random guesses. Raters probably follow some logic where they weight the things they like in a movie (a specific actress or a genre) against things they don’t like (long duration or bad jokes) and then come up with a score.

That process can be represented by a linear formula of the following kind:

where xₘ is a column vector with the values of the features of the movie m and θᵤ is another column vector with the weights that user u gives to each feature. Each user has a different set of weights and each film has a different set of values for its features.

It turns out that if we arbitrarily fix the number of features and ignore the missing ratings, we can find a set of weights and features values that create a new matrix with values close to the original rating matrix. This can be done with gradient descent, very much like the one used in linear regression. Instead of that now we are optimizing two sets of parameters (weights and features) at the same time.

Using the table I gave as an example above, the result of the optimization problem would generate the following new matrix:

Notice that the resulting matrix can’t be an exact copy of the original one in most real datasets because in real life people are not doing multiplications and summations to rate a movie. In most cases, the rating is just a gut feeling that can also be affected by all kinds of external factors. Still, our hope is that the linear formula is a good way to express the main logic that drives users ratings.

OK, now we have an approximate matrix. But how the heck does that help us to predict the missing ratings? Remember that to build the new matrix, we created a formula to fill all the values, including the ones that are missing in the original matrix. So if we want to predict the missing rating of a user on a movie, we just take all the feature values of that movie, multiply by all the weights of that user and sum everything up. So, in my example, if I want to predict User 2’s rating of Movie 1, I can do the following calculation:

To make things clearer, we can disassociate the _θ’_s and _x’_s and put them into their own matrices (say P and Q). That is effectively a Matrix Factorization, hence the name used by Prof. Ng.

That Matrix Factorization is basically what Funk did. He got third place in Netflix’s competition, attracting a lot of attention (which is an interesting case of a third place being more famous than the winners). His approach has been replicated and refined since then and is still in use in many applications.

Singular Value Decomposition

Enter Singular Value Decomposition (SVD). SVD is a fancy way to factorizing a matrix into three other matrices (A = UΣVᵀ). The way SVD is done guarantees those 3 matrices carry some nice mathematical properties.

There are many applications for SVD. One of them is Principal Component Analysis (PCA), which is just reducing a dataset of dimension n to dimension k (k < n).

I won’t give you any further detail on SVDs because I don’t know myself. The point is that it’s not the same thing as we did with Matrix Factorization. The biggest evidence is that SVD creates 3 matrices while Funk’s Matrix Factorization creates only 2.

So why SVD keeps popping up every time I search for Recommender Systems? I had to dig a little bit, but eventually, I found some hidden gems. According to Luis Argerich:

The matrix factorization algorithms used for recommender systems try to find two matrices: P,Q such as P*Q matches the KNOWN values of the utility matrix.

This principle appeared in the famous SVD++ “Factorization meets the neighborhood” paper that unfortunately used the name “SVD++” for an algorithm that has absolutely no relationship with the SVD.

For the record, I think Funk, not the authors of SVD++, first proposed the mentioned matrix factorization for recommender systems. In fact, SVD++, as its name suggests, is an extension of Funk’s work.

Xavier Amatriain gives us a bigger picture:

Let’s start by pointing out that the method usually referred to as “SVD” that is used in the context of recommendations is not strictly speaking the mathematical Singular Value Decomposition of a matrix but rather an approximate way to compute the low-rank approximation of the matrix by minimizing the squared error loss. A more accurate, albeit more generic, way to call this would be Matrix Factorization. The initial version of this approach in the context of the Netflix Prize was presented by Simon Funk in his famous Try This at Home blogpost. It is important to note that the “true SVD” approach had been indeed applied to the same task years before, with not so much practical success.

Wikipedia also has similar information buried in its Matrix factorization (recommender systems) article:

The original algorithm proposed by Simon Funk in his blog post factorized the user-item rating matrix as the product of two lower-dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated with a specific user or item is referred to as latent factors. Note that, despite its name, in FunkSVD no singular value decomposition is applied.

To summarize:

SVD is a somewhat complex mathematical technique that factorizes matrices intro three new matrices and has many applications, including PCA and RS.
Simon Funk applied a very smart strategy in the 2006 Netflix competition, factorizing a matrix into two other ones and using gradient descent to find optimal values of features and weights. It’s not SVD, but he used that term anyway to describe his technique.
The more appropriate term for what Funk did is Matrix Factorization.
Because of the good results and the fame that followed, people still call that technique SVD because, well, that’s how the author named it.

I hope this helps to clarify things a bit.