2008-Collaborative Filtering for Implicit Feedback Datasets
If you are interested in the paper, you can find it here.
Introduction
This paper explores the unique characteristics of implicit feedback datasets, which are crucial for building effective recommendation systems. Unlike explicit feedback, implicit feedback relies on observed user behaviors such as purchase history and viewing habits. Authors identify key challenges, including the lack of negative feedback, inherent noise, and the need for new evaluation metrics. By understanding these properties, Authors propose a factor model and scalable optimization procedure to create more accurate and personalized recommendations. This approach enhances user experiences without requiring explicit input, making it a valuable tool for modern recommendation systems.
Code Reproduction
search on github, this is Alternating Least Squares for CF, The url on github is https://github.com/benfred/implicit
I did not find the data source in the paper, so I still use MovieLens 100K as the data source.
Data Source
https://grouplens.org/datasets/movielens/100k/
""" 2008-Collaborative Filtering for Implicit Feedback Datasets Code Reproduction """ import os import sys sys.path.append(os.path.join(os.path.dirname(__file__), "..")) import copy import numpy as np import pandas as pd from tqdm import tqdm from util import sprint data_path = "../DATASETS/MovieLens100K/u.data" raw = pd.read_csv( data_path, sep="\t", header=None, names=["user_id", "item_id", "rating", "timestamp"], ) sprint(raw.head(), raw.shape) # Create a user-item matrix n_users = raw.user_id.unique().shape[0] n_items = raw.item_id.unique().shape[0] sprint("Number of users =", n_users, "; Number of items =", n_items) data = copy.deepcopy(raw) data = data.sort_values(by=["user_id", "item_id"]) ratings = np.zeros((n_users, n_items)) for row in raw.itertuples(): ratings[row[1] - 1, row[2] - 1] = row[3] sprint(ratings, ratings.shape) latent_factors = 5 # random init user and item matrix X = np.random.rand(n_users, latent_factors) Y = np.random.rand(n_items, latent_factors) aplha = 40 c_ui = 1 + aplha * ratings lambda_reg = 0.1 max_iterations = 20 for iteration in range(max_iterations): # optimize X for u in range(n_users): A = np.dot((c_ui[u, :].reshape(-1, 1) * Y).T, Y) + lambda_reg * np.eye(latent_factors) b = np.dot((c_ui[u, :] * ratings[u, :]).reshape(1, -1), Y).flatten() X[u, :] = np.linalg.solve(A, b) # optimize Y for i in range(n_items): A = np.dot((c_ui[:, i].reshape(-1, 1) * X).T, X) + lambda_reg * np.eye(latent_factors) b = np.dot((c_ui[:, i] * ratings[:, i]).reshape(1, -1), X).flatten() Y[i, :] = np.linalg.solve(A, b) # calculate current loss predictions = np.dot(X, Y.T) weighted_squared_error = np.sum(c_ui * (ratings - predictions) ** 2) regularization = lambda_reg * (np.sum(X ** 2) + np.sum(Y ** 2)) loss = weighted_squared_error + regularization print(f"Iteration {iteration+1}, Loss: {loss}") # select user 0 and item 30 print(f"Actual p_{0,3}: {ratings[0, 30]}") print(f"Predicted p_{0,3}: {np.dot(X[0, :], Y[30, :])}")