2003 - Amazon Item CF

November 13, 2024

2003-Amazon recommendations Item-to-item collaborative filtering

If you are interested in the paper, you can find it here.

Introduction

In the fast-paced world of e-commerce, where customer expectations are higher than ever, the ability to deliver personalized, real-time recommendations is no longer a luxury—it's a necessity. Traditional recommendation algorithms, while effective, often struggle with scalability and real-time responsiveness, especially in environments with massive datasets and dynamic customer interactions. This paper introduces a groundbreaking approach: item-to-item collaborative filtering, which addresses these challenges head-on.

Key Advantages of Item-to-Item Collaborative Filtering:

  1. Real-Time Recommendations: Unlike traditional methods that require offline computation, our algorithm generates recommendations instantaneously, ensuring that customers receive up-to-the-minute suggestions that align with their current interests.

  2. Scalability to Large Datasets: With the ability to handle tens of millions of customers and millions of products, our algorithm scales effortlessly, making it ideal for large retailers who need to maintain high-quality recommendations without compromising performance.

  3. High-Quality Recommendations: By leveraging item-to-item relationships rather than relying solely on customer-to-customer comparisons, our algorithm delivers more accurate and relevant recommendations, leading to higher click-through and conversion rates.

  4. Adaptability to New and Long-Time Customers: Whether a customer is new or has a long history of interactions, our algorithm adapts seamlessly, providing tailored recommendations that enhance the shopping experience.

In conclusion, item-to-item collaborative filtering represents a significant advancement in e-commerce recommendation systems. Its ability to deliver real-time, scalable, and high-quality recommendations makes it a game-changer for online retailers looking to elevate their customer experience and drive sales. As e-commerce continues to evolve, this innovative approach will undoubtedly play a pivotal role in shaping the future of personalized shopping.

Code Reproduction

Data Source

https://grouplens.org/datasets/movielens/100k/

from collections import defaultdict import os import sys sys.path.append(os.path.join(os.path.dirname(__file__), "..")) import copy import numpy as np import pandas as pd from util import sprint from tqdm import tqdm """ For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 """ # Load data """ use movie lens data suppose a rating is a purchase """ data_path = "../DATASETS/MovieLens100K/u.data" raw = pd.read_csv( data_path, sep="\t", header=None, names=["user_id", "item_id", "rating", "timestamp"], ) sprint(raw.head(), raw.shape) data = copy.deepcopy(raw) # all products products = data.item_id.unique() product = np.sort(products) product_vector = defaultdict(set) product_similarity = np.zeros((len(product), len(product))) # For each item in product catalog, I1 for product_1 in tqdm(product): # For each customer C who purchased I1 users_c = data[data.item_id == product_1].user_id.unique() users = np.sort(users_c) # For each item I2 purchased by customer C for user in users: items = data[ (data.user_id == user) & (data.item_id != product_1) ].item_id.unique() # Record that a customer purchased I1 and I2 product_vector[product_1].add(user) for product_2 in items: product_vector[product_2].add(user) # For each item I2 for product_2 in product_vector: if product_1 == product_2: continue # Compute the similarity between I1 and I2 # The similarity algorithm could be changed similarity = len(product_vector[product_1] & product_vector[product_2]) / len( product_vector[product_1] | product_vector[product_2] ) # print(f"Similarity between {product_1} and {product_2} is {similarity}") product_similarity[product_1 - 1, product_2 - 1] = similarity sprint(product_similarity, product_similarity.shape) # --------------------------------------------------- # accelerate by numpy # the conclusion is the slower than the pandas version products = data.item_id.unique() product = np.sort(products) product_vector = defaultdict(set) product_similarity = np.zeros((len(product), len(product))) np_data = data.to_numpy() for product_1 in tqdm(product): users_c = np_data[np_data[:, 1] == product_1, 0] users = np.sort(users_c) for user in users: items = np_data[ (np_data[:, 0] == user) & (np_data[:, 1] != product_1), 1 ] product_vector[product_1].add(user) for product_2 in items: product_vector[product_2].add(user) for product_2 in product_vector: if product_1 == product_2: continue similarity = len(product_vector[product_1] & product_vector[product_2]) / len( product_vector[product_1] | product_vector[product_2] ) product_similarity[product_1 - 1, product_2 - 1] = similarity sprint(product_similarity, product_similarity.shape)