2001 - Item based Collaborative Filtering Recommendation Algorithms
If you are interested in the paper, you can find it here.
Introduction
Back to 2001, in the rapidly evolving landscape of recommender systems, item-based collaborative filtering emerges as a game-changer. This paper highlights how item-based techniques outperform traditional user-based methods in both performance and quality, making them indispensable for large-scale applications. By focusing on item relationships, these algorithms ensure high-quality, real-time recommendations, even with sparse data and millions of users. As the demand for personalized experiences grows, item-based collaborative filtering stands out as a scalable and efficient solution, promising to enhance user satisfaction and engagement across various online platforms.
Code Reproduction
Data Source
https://grouplens.org/datasets/movielens/100k/
""" 2001-Item based Collaborative Filtering Recommendation Algorithms Code Reproduction """ import os import sys sys.path.append(os.path.join(os.path.dirname(__file__), "..")) import copy import pandas as pd import numpy as np from util import sprint # Load data data_path = "../DATASETS/MovieLens100K/u.data" raw = pd.read_csv( data_path, sep="\t", header=None, names=["user_id", "item_id", "rating", "timestamp"], ) sprint(raw.head(), raw.shape) # Create user-item matrix n_users = raw.user_id.unique().shape[0] n_items = raw.item_id.unique().shape[0] sprint("Number of users =", n_users) sprint("Number of items =", n_items) data = copy.deepcopy(raw) data = data.sort_values(by=["user_id", "item_id"]) ratings = np.zeros((n_users, n_items)) for row in raw.itertuples(): ratings[row[1] - 1, row[2] - 1] = row[3] sprint(ratings, ratings.shape) # Cosine Similarity def cosine_similarity(ratings): sim = ratings.dot(ratings.T) norms = np.array([np.sqrt(np.diagonal(sim))]) # norms / norms.T == np.outer(norms, norms) return sim / norms / norms.T # ratings is a user-item matrix, so we need to transpose it cosine_similarity_res = cosine_similarity(ratings.T) sprint(cosine_similarity_res, cosine_similarity_res.shape) # Correlation-based Similarity def pearson_similarity_item(ratings): mean_ratings = np.mean(ratings, axis=0) # 每个物品的平均评分 ratings_diff = ratings - mean_ratings sim = ratings_diff.T.dot(ratings_diff) norms = np.array([np.sqrt(np.sum(ratings_diff**2, axis=0))]) return sim / norms / norms.T pearson_similarity_item_res = pearson_similarity_item(ratings) sprint(pearson_similarity_item_res, pearson_similarity_item_res.shape) # Adjusted Cosine Similarity def pearson_similarity_item_improve(ratings): mean_ratings = np.mean(ratings, axis=1) # 每个用户的平均评分 ratings_diff = ratings - mean_ratings[:, np.newaxis] sim = ratings_diff.T.dot(ratings_diff) norms = np.array([np.sqrt(np.sum(ratings_diff**2, axis=0))]) return sim / norms / norms.T pearson_similarity_item_improve_res = pearson_similarity_item_improve(ratings) sprint(pearson_similarity_item_improve_res, pearson_similarity_item_improve_res.shape) # Prediction def predict(ratings, similarity): dot = ratings.dot(similarity) abs_sim = np.sum(np.abs(similarity), axis=0) sprint(dot.shape, abs_sim.shape) return dot / abs_sim cosine_prediction_item = predict(ratings, cosine_similarity_res) sprint(cosine_prediction_item, cosine_prediction_item.shape) pearson_prediction_item = predict(ratings, pearson_similarity_item_res) sprint(pearson_prediction_item, pearson_prediction_item.shape) pearson_prediction_item_improve = predict(ratings, pearson_similarity_item_improve_res) sprint(pearson_prediction_item_improve, pearson_prediction_item_improve.shape)