2005 - Slope one CF

November 14, 2024

2005-Slope One Predictors for Online Rating-Based Collaborative Filtering

If you are interested in the paper, you can find it here.

Introduction

In the realm of collaborative filtering, the Slope One algorithm stands out as a powerful yet straightforward solution that addresses the core challenges of recommendation systems. By leveraging the average rating differences between items, Slope One schemes offer a robust framework that is both easy to implement and maintain. This simplicity does not come at the expense of accuracy; rather, it delivers competitive results on benchmark datasets like EachMovie and Movielens, rivaling more complex methods that often sacrifice efficiency and scalability.

Key Takeaways:

  1. Simplicity and Efficiency: The Slope One algorithm is designed to be intuitive and easy to implement, making it accessible for developers and maintainers alike. Its straightforward nature ensures that the aggregated data is easily understandable, facilitating quick troubleshooting and updates.

  2. Real-Time Updates: One of the standout features of Slope One is its ability to handle dynamic updates seamlessly. New ratings are immediately incorporated into the prediction model, ensuring that recommendations remain current and relevant.

  3. Scalability and Query Speed: While Slope One may require more storage for precomputed data, the trade-off results in lightning-fast query times. This efficiency is crucial for real-time applications where quick responses are essential.

  4. Inclusivity for New Users: The algorithm excels in providing effective recommendations even for users with limited rating history. This inclusivity is vital for systems that need to cater to first-time visitors or users with sparse data.

  5. Balanced Accuracy: Despite its simplicity, Slope One achieves a reasonable level of accuracy that is on par with more sophisticated methods. This balance ensures that the algorithm remains competitive without compromising on its core strengths of simplicity and efficiency.

In conclusion, the Slope One algorithm represents a significant advancement in collaborative filtering. Its ability to meet all five key objectives—ease of implementation, instant update capability, efficient query time, low requirement for first-time visitors, and reasonable accuracy—makes it a strong candidate for real-world applications. As the demand for scalable, efficient, and accurate recommendation systems continues to grow, Slope One emerges as a benchmark solution that bridges the gap between simplicity and performance.

Code Reproduction

Data Source

https://grouplens.org/datasets/movielens/100k/

""" 2005-Slope One Predictors for Online Rating-Based Collaborative Filtering Code Reproduction """ import os import sys sys.path.append(os.path.join(os.path.dirname(__file__), "..")) import copy import pandas as pd import numpy as np from util import sprint from tqdm import tqdm data_path = "../DATASETS/MovieLens100K/u.data" raw = pd.read_csv( data_path, sep="\t", header=None, names=["user_id", "item_id", "rating", "timestamp"], ) sprint(raw.head(), raw.shape) # Create user-item matrix n_users = raw.user_id.unique().shape[0] n_items = raw.item_id.unique().shape[0] sprint("Number of users =", n_users) sprint("Number of items =", n_items) data = copy.deepcopy(raw) data = data.sort_values(by=["user_id", "item_id"]) ratings = np.zeros((n_users, n_items)) for row in raw.itertuples(): ratings[row[1] - 1, row[2] - 1] = row[3] sprint(ratings, ratings.shape) # deviations def deviations(ratings): n_users, n_items = ratings.shape dev = np.zeros((n_items, n_items)) freq = np.zeros((n_items, n_items)) for u in tqdm(range(n_users)): rated_items = np.where(ratings[u] > 0)[0] for i in rated_items: for j in rated_items: if i != j: dev[i, j] += ratings[u, i] - ratings[u, j] freq[i, j] += 1 for i in tqdm(range(n_items)): for j in range(n_items): if freq[i, j] > 0: dev[i, j] /= freq[i, j] return dev, freq # slope one def slope_one_predict(ratings, deviations, frequencies): n_users, n_items = ratings.shape predictions = np.zeros((n_users, n_items)) for u in tqdm(range(n_users)): rated_items = np.where(ratings[u] > 0)[0] u_bar = np.mean(ratings[u, rated_items]) for j in range(n_items): if j not in rated_items: sum_card = np.sum(frequencies[rated_items, j]) if sum_card != 0: predictions[u, j] = ( u_bar + np.sum(deviations[rated_items, j]) / sum_card ) return predictions # weighted slope one def weighted_slope_one_predict(ratings, deviations, frequencies): n_users, n_items = ratings.shape predictions = np.zeros((n_users, n_items)) for u in tqdm(range(n_users)): rated_items = np.where(ratings[u] > 0)[0] for j in range(n_items): if j not in rated_items: sum_card = np.sum(frequencies[rated_items, j]) if sum_card != 0: predictions[u, j] = ( np.sum( (deviations[rated_items, j] + ratings[u, rated_items]) * frequencies[rated_items, j] ) / sum_card ) return predictions # bi-polar slope one # if rating is greater than 3, it is positive # if rating is less than 3 and greater than 0, it is negative like_dislike_threshold = 3 like_ratings = ratings.copy() like_ratings = np.where(like_ratings < like_dislike_threshold, 0, like_ratings) dislike_ratings = ratings.copy() dislike_ratings = np.where( dislike_ratings >= like_dislike_threshold, 0, dislike_ratings ) sprint(like_ratings, like_ratings.shape) sprint(dislike_ratings, dislike_ratings.shape) like_dev_matrix, like_freq_matrix = deviations(like_ratings) dislike_dev_matrix, dislike_freq_matrix = deviations(dislike_ratings) def bi_polar_slope_one_predict( like_ratings, dislike_ratings, like_dev, like_freq, dislike_dev, dislike_freq ): n_users, n_items = like_ratings.shape predictions = np.zeros((n_users, n_items)) for u in tqdm(range(n_users)): rated_items = np.where(like_ratings[u] > 0)[0] for j in range(n_items): if j not in rated_items: sum_card = np.sum( like_freq[rated_items, j] + dislike_freq[rated_items, j] ) if sum_card != 0: predictions[u, j] = ( np.sum( (like_dev[rated_items, j] + like_ratings[u, rated_items]) * like_freq[rated_items, j] + ( dislike_dev[rated_items, j] + dislike_ratings[u, rated_items] ) * dislike_freq[rated_items, j] ) / sum_card ) return predictions deviations_matrix, frequencies_matrix = deviations(ratings) sprint(deviations_matrix, deviations_matrix.shape) sprint(frequencies_matrix, frequencies_matrix.shape) slope_one_predict_res = slope_one_predict( ratings, deviations_matrix, frequencies_matrix ) sprint(slope_one_predict_res, slope_one_predict_res.shape) weighted_slope_one_predict_res = weighted_slope_one_predict( ratings, deviations_matrix, frequencies_matrix ) sprint(weighted_slope_one_predict_res, weighted_slope_one_predict_res.shape) bi_polar_slope_one_predict_res = bi_polar_slope_one_predict( like_ratings, dislike_ratings, like_dev_matrix, like_freq_matrix, dislike_dev_matrix, dislike_freq_matrix ) sprint(bi_polar_slope_one_predict_res, bi_polar_slope_one_predict_res.shape)