2008-Collaborative Filtering for Implicit Feedback Datasets

If you are interested in the paper, you can find it here.

Introduction

This paper explores the unique characteristics of implicit feedback datasets, which are crucial for building effective recommendation systems. Unlike explicit feedback, implicit feedback relies on observed user behaviors such as purchase history and viewing habits. Authors identify key challenges, including the lack of negative feedback, inherent noise, and the need for new evaluation metrics. By understanding these properties, Authors propose a factor model and scalable optimization procedure to create more accurate and personalized recommendations. This approach enhances user experiences without requiring explicit input, making it a valuable tool for modern recommendation systems.

Code Reproduction

search on github, this is Alternating Least Squares for CF, The url on github is https://github.com/benfred/implicit

I did not find the data source in the paper, so I still use MovieLens 100K as the data source.

Data Source

https://grouplens.org/datasets/movielens/100k/

"""
2008-Collaborative Filtering for Implicit Feedback Datasets
Code Reproduction
"""

import os
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), ".."))

import copy
import numpy as np
import pandas as pd
from tqdm import tqdm
from util import sprint

data_path = "../DATASETS/MovieLens100K/u.data"
raw = pd.read_csv(
    data_path,
    sep="\t",
    header=None,
    names=["user_id", "item_id", "rating", "timestamp"],
)
sprint(raw.head(), raw.shape)

# Create a user-item matrix
n_users = raw.user_id.unique().shape[0]
n_items = raw.item_id.unique().shape[0]
sprint("Number of users =", n_users, "; Number of items =", n_items)

data = copy.deepcopy(raw)
data = data.sort_values(by=["user_id", "item_id"])

ratings = np.zeros((n_users, n_items))
for row in raw.itertuples():
    ratings[row[1] - 1, row[2] - 1] = row[3]
sprint(ratings, ratings.shape)

latent_factors = 5
# random init user and item matrix
X = np.random.rand(n_users, latent_factors)
Y = np.random.rand(n_items, latent_factors)

aplha = 40
c_ui = 1 + aplha * ratings
lambda_reg = 0.1

max_iterations = 20

for iteration in range(max_iterations):
    # optimize X
    for u in range(n_users):
        A = np.dot((c_ui[u, :].reshape(-1, 1) * Y).T, Y) + lambda_reg * np.eye(latent_factors)
        b = np.dot((c_ui[u, :] * ratings[u, :]).reshape(1, -1), Y).flatten()
        X[u, :] = np.linalg.solve(A, b)

    # optimize Y
    for i in range(n_items):
        A = np.dot((c_ui[:, i].reshape(-1, 1) * X).T, X) + lambda_reg * np.eye(latent_factors)
        b = np.dot((c_ui[:, i] * ratings[:, i]).reshape(1, -1), X).flatten()
        Y[i, :] = np.linalg.solve(A, b)

    # calculate current loss
    predictions = np.dot(X, Y.T)
    weighted_squared_error = np.sum(c_ui * (ratings - predictions) ** 2)
    regularization = lambda_reg * (np.sum(X ** 2) + np.sum(Y ** 2))
    loss = weighted_squared_error + regularization
    print(f"Iteration {iteration+1}, Loss: {loss}")

# select user 0 and item 30
print(f"Actual p_{0,3}: {ratings[0, 30]}")
print(f"Predicted p_{0,3}: {np.dot(X[0, :], Y[30, :])}")

2008 - Collaborative Filtering for Implicit Feedback Datasets

2008-Collaborative Filtering for Implicit Feedback Datasets

Introduction

Code Reproduction

Data Source