2005-Slope One Predictors for Online Rating-Based Collaborative Filtering

If you are interested in the paper, you can find it here.

Introduction

In the realm of collaborative filtering, the Slope One algorithm stands out as a powerful yet straightforward solution that addresses the core challenges of recommendation systems. By leveraging the average rating differences between items, Slope One schemes offer a robust framework that is both easy to implement and maintain. This simplicity does not come at the expense of accuracy; rather, it delivers competitive results on benchmark datasets like EachMovie and Movielens, rivaling more complex methods that often sacrifice efficiency and scalability.

Key Takeaways:

Simplicity and Efficiency: The Slope One algorithm is designed to be intuitive and easy to implement, making it accessible for developers and maintainers alike. Its straightforward nature ensures that the aggregated data is easily understandable, facilitating quick troubleshooting and updates.
Real-Time Updates: One of the standout features of Slope One is its ability to handle dynamic updates seamlessly. New ratings are immediately incorporated into the prediction model, ensuring that recommendations remain current and relevant.
Scalability and Query Speed: While Slope One may require more storage for precomputed data, the trade-off results in lightning-fast query times. This efficiency is crucial for real-time applications where quick responses are essential.
Inclusivity for New Users: The algorithm excels in providing effective recommendations even for users with limited rating history. This inclusivity is vital for systems that need to cater to first-time visitors or users with sparse data.
Balanced Accuracy: Despite its simplicity, Slope One achieves a reasonable level of accuracy that is on par with more sophisticated methods. This balance ensures that the algorithm remains competitive without compromising on its core strengths of simplicity and efficiency.

In conclusion, the Slope One algorithm represents a significant advancement in collaborative filtering. Its ability to meet all five key objectives—ease of implementation, instant update capability, efficient query time, low requirement for first-time visitors, and reasonable accuracy—makes it a strong candidate for real-world applications. As the demand for scalable, efficient, and accurate recommendation systems continues to grow, Slope One emerges as a benchmark solution that bridges the gap between simplicity and performance.

Code Reproduction

Data Source

https://grouplens.org/datasets/movielens/100k/

"""
2005-Slope One Predictors for Online Rating-Based Collaborative Filtering
Code Reproduction
"""

import os
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), ".."))

import copy
import pandas as pd
import numpy as np
from util import sprint
from tqdm import tqdm

data_path = "../DATASETS/MovieLens100K/u.data"
raw = pd.read_csv(
    data_path,
    sep="\t",
    header=None,
    names=["user_id", "item_id", "rating", "timestamp"],
)
sprint(raw.head(), raw.shape)

# Create user-item matrix
n_users = raw.user_id.unique().shape[0]
n_items = raw.item_id.unique().shape[0]
sprint("Number of users =", n_users)
sprint("Number of items =", n_items)

data = copy.deepcopy(raw)
data = data.sort_values(by=["user_id", "item_id"])

ratings = np.zeros((n_users, n_items))
for row in raw.itertuples():
    ratings[row[1] - 1, row[2] - 1] = row[3]
sprint(ratings, ratings.shape)


# deviations
def deviations(ratings):
    n_users, n_items = ratings.shape
    dev = np.zeros((n_items, n_items))
    freq = np.zeros((n_items, n_items))

    for u in tqdm(range(n_users)):
        rated_items = np.where(ratings[u] > 0)[0]
        for i in rated_items:
            for j in rated_items:
                if i != j:
                    dev[i, j] += ratings[u, i] - ratings[u, j]
                    freq[i, j] += 1

    for i in tqdm(range(n_items)):
        for j in range(n_items):
            if freq[i, j] > 0:
                dev[i, j] /= freq[i, j]

    return dev, freq


# slope one
def slope_one_predict(ratings, deviations, frequencies):
    n_users, n_items = ratings.shape
    predictions = np.zeros((n_users, n_items))

    for u in tqdm(range(n_users)):
        rated_items = np.where(ratings[u] > 0)[0]
        u_bar = np.mean(ratings[u, rated_items])
        for j in range(n_items):
            if j not in rated_items:
                sum_card = np.sum(frequencies[rated_items, j])
                if sum_card != 0:
                    predictions[u, j] = (
                        u_bar + np.sum(deviations[rated_items, j]) / sum_card
                    )

    return predictions


# weighted slope one
def weighted_slope_one_predict(ratings, deviations, frequencies):
    n_users, n_items = ratings.shape
    predictions = np.zeros((n_users, n_items))

    for u in tqdm(range(n_users)):
        rated_items = np.where(ratings[u] > 0)[0]
        for j in range(n_items):
            if j not in rated_items:
                sum_card = np.sum(frequencies[rated_items, j])
                if sum_card != 0:
                    predictions[u, j] = (
                        np.sum(
                            (deviations[rated_items, j] + ratings[u, rated_items])
                            * frequencies[rated_items, j]
                        )
                        / sum_card
                    )
    return predictions


# bi-polar slope one
# if rating is greater than 3, it is positive
# if rating is less than 3 and greater than 0, it is negative
like_dislike_threshold = 3

like_ratings = ratings.copy()
like_ratings = np.where(like_ratings < like_dislike_threshold, 0, like_ratings)

dislike_ratings = ratings.copy()
dislike_ratings = np.where(
    dislike_ratings >= like_dislike_threshold, 0, dislike_ratings
)
sprint(like_ratings, like_ratings.shape)
sprint(dislike_ratings, dislike_ratings.shape)

like_dev_matrix, like_freq_matrix = deviations(like_ratings)
dislike_dev_matrix, dislike_freq_matrix = deviations(dislike_ratings)


def bi_polar_slope_one_predict(
    like_ratings, dislike_ratings, like_dev, like_freq, dislike_dev, dislike_freq
):
    n_users, n_items = like_ratings.shape
    predictions = np.zeros((n_users, n_items))

    for u in tqdm(range(n_users)):
        rated_items = np.where(like_ratings[u] > 0)[0]
        for j in range(n_items):
            if j not in rated_items:
                sum_card = np.sum(
                    like_freq[rated_items, j] + dislike_freq[rated_items, j]
                )
                if sum_card != 0:
                    predictions[u, j] = (
                        np.sum(
                            (like_dev[rated_items, j] + like_ratings[u, rated_items])
                            * like_freq[rated_items, j]
                            + (
                                dislike_dev[rated_items, j]
                                + dislike_ratings[u, rated_items]
                            )
                            * dislike_freq[rated_items, j]
                        )
                        / sum_card
                    )
    return predictions


deviations_matrix, frequencies_matrix = deviations(ratings)
sprint(deviations_matrix, deviations_matrix.shape)
sprint(frequencies_matrix, frequencies_matrix.shape)
slope_one_predict_res = slope_one_predict(
    ratings, deviations_matrix, frequencies_matrix
)
sprint(slope_one_predict_res, slope_one_predict_res.shape)

weighted_slope_one_predict_res = weighted_slope_one_predict(
    ratings, deviations_matrix, frequencies_matrix
)
sprint(weighted_slope_one_predict_res, weighted_slope_one_predict_res.shape)

bi_polar_slope_one_predict_res = bi_polar_slope_one_predict(
    like_ratings, dislike_ratings, like_dev_matrix, like_freq_matrix, dislike_dev_matrix, dislike_freq_matrix
)
sprint(bi_polar_slope_one_predict_res, bi_polar_slope_one_predict_res.shape)

2005 - Slope one CF

2005-Slope One Predictors for Online Rating-Based Collaborative Filtering

Introduction

Key Takeaways:

Code Reproduction

Data Source