2003-Amazon recommendations Item-to-item collaborative filtering

If you are interested in the paper, you can find it here.

Introduction

In the fast-paced world of e-commerce, where customer expectations are higher than ever, the ability to deliver personalized, real-time recommendations is no longer a luxury—it's a necessity. Traditional recommendation algorithms, while effective, often struggle with scalability and real-time responsiveness, especially in environments with massive datasets and dynamic customer interactions. This paper introduces a groundbreaking approach: item-to-item collaborative filtering, which addresses these challenges head-on.

Key Advantages of Item-to-Item Collaborative Filtering:

Real-Time Recommendations: Unlike traditional methods that require offline computation, our algorithm generates recommendations instantaneously, ensuring that customers receive up-to-the-minute suggestions that align with their current interests.
Scalability to Large Datasets: With the ability to handle tens of millions of customers and millions of products, our algorithm scales effortlessly, making it ideal for large retailers who need to maintain high-quality recommendations without compromising performance.
High-Quality Recommendations: By leveraging item-to-item relationships rather than relying solely on customer-to-customer comparisons, our algorithm delivers more accurate and relevant recommendations, leading to higher click-through and conversion rates.
Adaptability to New and Long-Time Customers: Whether a customer is new or has a long history of interactions, our algorithm adapts seamlessly, providing tailored recommendations that enhance the shopping experience.

In conclusion, item-to-item collaborative filtering represents a significant advancement in e-commerce recommendation systems. Its ability to deliver real-time, scalable, and high-quality recommendations makes it a game-changer for online retailers looking to elevate their customer experience and drive sales. As e-commerce continues to evolve, this innovative approach will undoubtedly play a pivotal role in shaping the future of personalized shopping.

Code Reproduction

Data Source

https://grouplens.org/datasets/movielens/100k/

from collections import defaultdict
import os
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), ".."))

import copy
import numpy as np
import pandas as pd
from util import sprint
from tqdm import tqdm

"""
For each item in product catalog, I1
	For each customer C who purchased I1
		For each item I2 purchased by customer C
			Record that a customer purchased I1 and I2
	For each item I2
		Compute the similarity between I1 and I2
"""

# Load data
"""
use movie lens data
suppose a rating is a purchase
"""
data_path = "../DATASETS/MovieLens100K/u.data"
raw = pd.read_csv(
    data_path,
    sep="\t",
    header=None,
    names=["user_id", "item_id", "rating", "timestamp"],
)
sprint(raw.head(), raw.shape)

data = copy.deepcopy(raw)


# all products
products = data.item_id.unique()
product = np.sort(products)

product_vector = defaultdict(set)

product_similarity = np.zeros((len(product), len(product)))

# For each item in product catalog, I1
for product_1 in tqdm(product):
    # For each customer C who purchased I1
    users_c = data[data.item_id == product_1].user_id.unique()
    users = np.sort(users_c)
    # For each item I2 purchased by customer C
    for user in users:
        items = data[
            (data.user_id == user) & (data.item_id != product_1)
        ].item_id.unique()
        # Record that a customer purchased I1 and I2
        product_vector[product_1].add(user)
        for product_2 in items:
            product_vector[product_2].add(user)
    # For each item I2
    for product_2 in product_vector:
        if product_1 == product_2:
            continue
        # Compute the similarity between I1 and I2
        # The similarity algorithm could be changed
        similarity = len(product_vector[product_1] & product_vector[product_2]) / len(
            product_vector[product_1] | product_vector[product_2]
        )
        # print(f"Similarity between {product_1} and {product_2} is {similarity}")
        product_similarity[product_1 - 1, product_2 - 1] = similarity
        
sprint(product_similarity, product_similarity.shape)


# ---------------------------------------------------
# accelerate by numpy
# the conclusion is the slower than the pandas version
products = data.item_id.unique()
product = np.sort(products)

product_vector = defaultdict(set)

product_similarity = np.zeros((len(product), len(product)))

np_data = data.to_numpy()
for product_1 in tqdm(product):
    users_c = np_data[np_data[:, 1] == product_1, 0]
    users = np.sort(users_c)
    for user in users:
        items = np_data[
            (np_data[:, 0] == user) & (np_data[:, 1] != product_1), 1
        ]
        product_vector[product_1].add(user)
        for product_2 in items:
            product_vector[product_2].add(user)
    for product_2 in product_vector:
        if product_1 == product_2:
            continue
        similarity = len(product_vector[product_1] & product_vector[product_2]) / len(
            product_vector[product_1] | product_vector[product_2]
        )
        product_similarity[product_1 - 1, product_2 - 1] = similarity
        
sprint(product_similarity, product_similarity.shape)

2003 - Amazon Item CF

2003-Amazon recommendations Item-to-item collaborative filtering

Introduction

Code Reproduction

Data Source