Introduction to Recommender Systems
We are starting a series of articles on recommender systems, one of the most current and relevant topics in Machine Learning. The aim of this first article is to give a brief introduction to recommender systems. We'll talk about their importance, where they are used and what strategies are commonly used to represent recommendation data.
Introduction
With the advent of Web 2.0, most day-to-day activities are now carried out online: today it is possible to take courses, shop, watch movies and series, play games, among many others. As we have everything in the palm of our hands, the amount of information available has exploded and we are bombarded all the time with all kinds of products and services. The task of choosing a pair of sneakers to buy, a video game to play or a series to watch has become monstrously difficult. What should be trivial can now take several hours to find what we really want to consume. After all, who hasn't accessed Netflix and spent so long looking that they gave up without actually watching anything?
It is in this context that recommendation systems emerge and their aim is nothing more than to help the user identify items that they might like (in this article, we'll call items any products, services or movies that can be recommended). Then, based on the user's viewing/consumption history, the recommender system is able to predict the user's degree of preference for the items they haven't seen yet, and based on this, it is able to present the items they are most likely to like.
The benefits of recommendation systems are numerous, both for users and for sellers/content providers. For users, it's much easier to find what to consume. For sellers/content providers, there is an increase in the number of sales and/or items consumed, because users can find what they are looking for relatively easily.
The importance given to these systems is such that in 2006 Netflix launched a challenge: the team that developed an algorithm that outperformed its own by 10% would be awarded US$1,000,000. This competition became known as the Netflix Prize, and was only finalized in 2009, when BellKor's Pragmatic Chaos team finally managed to beat Netflix's algorithm by 10.06%. Investment in this area has only increased since then, which is why recommendation systems have been used by most major companies, such as Netflix, Amazon, Google, Facebook, Spotify, and many others.
But how can we implement a recommendation system? What data is used and what kind of Machine Learning techniques can we use?
Defining the problem
The main idea behind recommendation systems is to try to predict how much a user will like a particular item. To do this, past information on user consumption is used. Consider the example of a movie streaming service:
John gave the movie Harry Potter 1 a rating of 5 (excellent)
John rated the movie Mission Impossible 3 (fair)
John rated Fantastic Beasts 4 (good)
Maria gave Harry Potter 2 a score of 2 (bad)
Maria rated Fantastic Beasts 1 (bad)
Flávia gave Harry Potter 1 a score of 5 (excellent)
The most traditional way to represent this data computationally is through a matrix of ratings, R, such that Rᵤᵥ represents the rating that useru gave.
In this case, the recommendation task would then be to predict a value for the missing entries in the matrix. Some techniques for this will be presented in the second article in this series on recommender systems.
There are alternative ways of representing data in recommender systems. One of them is by means of an attribute-value table, as is generally used in Machine Learning.