Models

Models

Common interfaces

Recommenders.AbstractRecommender — Type

AbstractRecommender

Abstract struct for all recommendation models.

source

Recommenders.evaluate_u2i — Method

evaluate_u2i(model, train_table, test_table, metric, n; kwargs...)

Perform fit! for model on train_table, predict for each user in test_table, and evaluate by metric.

Arguments

model::AbstractRecommender: model to evaluate.
train_table: any Tables.jl-compatible data for train.
test_table: any Tables.jl-compatible data for test.
metric: evaluation metrics, MeanMetric or collection of MeanMetric.
n::Int64: number of retrieved items.

Keyword arguments

drop_history::Bool: whether to drop already consumed items from predictions.
col_user: name of user column in table
col_item: name of item column in table
any model-dependent arguments.

Return

Evaluated metrics for test_table.

source

Recommenders.fit! — Method

fit!(model::AbstractRecommender, table; kwargs...)

Train model by table.

Arguments

model: concrete type under AbstractRecommender
table: any Tables.jl-compatible data for train.

Keyword arguments

col_user: name of user column in table
col_item: name of item column in table
and other model-dependent arguments.

source

Recommenders.load_model — Method

load_model(model::AbstractRecommender, filepath)

Load model by JLD2.

Arguments

filepath: path from which load the model. If the model save multiple files, this argument points to directory.

source

Recommenders.predict_i2i — Method

predict_i2i(model, itemid, n; kwargs...)

Make recommendations given an item. When itemid is a collection of raw item ids, this function performs parallel predictions by Threads.@threads.

Arguments

model::AbstractRecommender: trained model.
itemid:: item id to get predictions. type is AbstractString, Int or their collection.
n::Int64: number of retrieved items.

Keyword arguments

other model-dependent arguments.

Return

Vector of predicted items, ordered by descending score.

source

Recommenders.predict_u2i — Method

predict_u2i(model, userid, n; kwargs...)

Make recommendations to user (or users). When userid is a collection of raw user ids, this function performs parallel predictions by Threads.@threads.

Arguments

model::AbstractRecommender: trained model.
userid:: user id to get predictions. type is AbstractString, Int or their collection.
n::Int64: number of retrieved items.

Keyword arguments

drop_history::Bool: whether to drop already consumed items from predictions.
and other model-dependent arguments.

Return

Vector of predicted items, ordered by descending score.

source

Recommenders.save_model — Function

save_model(model::AbstractRecommender, filepath, overwrite = false)

Save model by JLD2.

Arguments

model::AbstractRecommender: model to save.
filepath: path to save the model. If the model save multiple files, this argument points to directory.
overwrite: whether to overwrite if filepath already exists.

source

Most Popular

Recommenders.MostPopular — Type

MostPopular()

Non-personalized baseline model which predicts n-most popular items in the corpus.

source

Recommenders.fit! — Method

fit!(model::MostPopular, table; col_user = :userid, col_item = :itemid)

Fit most popular model.

source

Recommenders.predict_i2i — Method

predict_i2i(model::MostPopular, itemid::Union{AbstractString,Int}, n::Int64)

Make n prediction for a give item by most popular model.

source

Recommenders.predict_u2i — Method

predict_u2i(model::MostPopular, userid::Union{AbstractString,Int}, n::Int64; drop_history::Bool = false)

Make n prediction to user by most popular model.

source

Item kNN

Recommenders.ItemkNN — Type

ItemkNN(k::Int64, shrink::Float64, weighting::Union{Nothing,Symbol}, weighting_at_inference::Bool, normalize::Bool, normalize_similarity::Bool)

Item-based k-nearest neighborhood algorithm with cosine similarity. The model first computes the item-to-item similarity matrix

\[s_{ij} = \frac{\bm r_i \cdot \bm r_j}{\|\bm r_i\|\|\bm r_j\| + h}\,,\]

where $r_{i,u}$ is rating for item $i$ by user $u$ and $h$ is the shrink parameter to suppress the contributions from items with a few ratings.

Constructor arguments

k: size of the nearest neighbors. Only the k-most similar items to each item are stored, which reduces sparse similarity maxrix size, and also make better predictions.
shrink: shrink paramerer explained above.
weighting: if set to :ifidf or :bm25, the raw rating matrix is weighted by TF-IDF or BM25, respectively, before computing similarity. If not necessary, just set nothing.
weighting_at_inference: to use above weighting at inference time, only relevant for BM25.
normalize_similarity: if set to true, normalize each column of similarity matrix. See the reference for detail.

References

M. Deshpande and G. Karypis (2004), Item-based top-N recommendation algorithms.

source

Recommenders.fit! — Method

fit!(model::ItemkNN, table; col_user = :userid, col_item = :itemid, col_rating = :rating)

Fit the ItemkNN model. col_rating specifies rating column in the table, which will be all unity if implicit feedback data is given.

source

Recommenders.predict_i2i — Method

predict_i2i(model::ItemkNN, itemid::Union{AbstractString,Int}, n::Int64)

Make n prediction for a give item by ItenkNN model.

source

Recommenders.predict_u2i — Method

predict_u2i(model::ItemkNN, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Recommend top-n items for user by ItemkNN. The predicted rating of item $i$ by user $u$ is computed by

\[ \hat{r}_{i, u} = \sum_{j} s_{ij} r_{j, u}\,,\]

where $r_{j, u}$ is the actual user rating while $\hat{r}_{i, u}$ is the model prediction.

source

Matrix Factorization

Recommenders.ImplicitMF — Type

ImplicitMF(dim::Int64, use_bias::Bool, reg_coeff::Float64)

Matrix factorization model for implicit feedback. The predicted rating for item $i$ by user $u$ is expreseed as

\[\hat r_{ui} = \mu + b_i + b_u + \bm u_u \cdot \bm v_i\,,\]

Unlike the model for explicit feedback, the model treats all the (user, item) pairs in the train dataset as positive interaction with label 1, and sample negative (user, item) pairs from the corpus. Currently only the uniform item sampling is implemented. The fitting criteria is the ordinary logloss function

\[ L = -r_{ui}\log(\hat r_{ui}) - (1 - r_{ui})\log(1 - \hat r_{ui}).\]

Constructor arguments

dim: dimension of user/item vectors.
use_bias: if set to false, the bias terms ($\mu$, $b_i$, $b_u$) are set to zero.
reg_coeff: $L_2$ regularization coefficients for model parameters.

References

For instance, Rendle et. al. (2020), Neural Collaborative Filtering vs. Matrix Factorization Revisited .

source

Recommenders.fit! — Method

fit!(model::ImplicitMF, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)

Fit the ImplicitMF model by stochastic grandient descent (with no batching).

Model-specific arguments

n_epochs: number of epochs. During one epoch, all the row in table is read once.
learning_rate: Learing rate of SGD.
n_negatives: Number of negative item samples per positive (user, item) pair.
verbose: If set to positive integer, the training info is printed once per verbose.
callbacks: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.

source

Recommenders.predict_u2i — Method

predict_u2i(model::ImplicitMF, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by using $\hat r_{ui}$.

source

Bayesian Personalized Ranking

Recommenders.BPR — Type

BPR(dim::Int64, reg_coeff::Float64)

Bayesian personalized ranking model. The model evaluates user-item triplet $(u ,i ,j)$, which expresses "the user $u$ prefers item $i$ to item $j$. Here the following matrix factoriazation model is adopted to model this relation:

\[p_{uij} = \bm u_u \cdot \bm v_i - \bm u_u \cdot \bm v_j\]

Constructor arguments

dim: dimension of user/item vectors.
reg_coeff: $L_2$ regularization coefficients for model parameters.

source

Recommenders.fit! — Method

fit!(model::BPR, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)

Fit the BPR model by stochastic grandient descent. Instead the learnBPR algorithm proposed by the original paper, the simple SGD with negative sampling is implemented.

Model-specific arguments

n_epochs: number of epochs. During one epoch, all the row in table is read once.
learning_rate: Learing rate of SGD.
n_negatives: Number of negative item samples per positive (user, item) pair.
verbose: If set to positive integer, the training info is printed once per verbose.
callbacks: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.

References

Rendel et. al. (2012), BPR: Bayesian Personalized Ranking from Implicit Feedback

source

Recommenders.predict_u2i — Method

predict_u2i(model::BPR, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by using $\bm u_u \cdot \bm v_i$.

source

Sparse Linear Machine

Recommenders.SLIM — Type

SLIM(l1_ratio::Float64 = 0.5, λminratio::Float64 = 1e-4, k::Int = -1)

Sparse linear machine for recommendation, modified with Elastic Net loss. The prediction is made by

\[\hat r_{ui} = \sum_{j\neq i} w_{ij} r_{uj}\]

where $r_{ui}$ is the actual rating for item $i$ by user $u$, and $\hat r_{ui}$ is the predicted value. $w_{ij}$ is the model weght matrix. See Refs for algorithm details. SLIM uses Lasso.jl for optimization.

Constructor arguments

l1_ratio: ratio of coefficients between $L_1$ and $L_2$ losses. l1_ratio $\to 0$ means the Ridge regularization, while l1_ratio $\to \infty$ the Lasso.
λminratio: parameter which governs the strength of regularization. See the docs of Lasso.jl.
k: the nearest neighborhood size, similar to ItemkNN. If k < 1, the neigoborhood size is infinity.

References

X. Ning and G. Karypis (2011), SLIM: Sparse Linear Methods for Top-N Recommender Systems
M. Levy (2013), Efficient Top-N Recommendation by Linear Regression

source

Recommenders.fit! — Method

fit!(model::SLIM, table; col_user = :userid, col_item = :itemid, col_rating = :rating, cd_tol = 1e-7, nλ = 100)

Fit the SLIM model.

Model-specific arguments

cd_tol: tolerance paramerer for convergence, see Lasso.jl
nλ: length of regularization path, see Lasso.jl

source

Recommenders.predict_i2i — Method

predict_i2i(model::SLIM, itemid::Union{AbstractString,Int}, n::Int64)

Make n prediction for a give item by SLIM model.

source

Recommenders.predict_u2i — Method

predict_u2i(model::SLIM, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by SLIM model.

source

Random Walk

Recommenders.Randomwalk — Type

Randomwalk()

Recommendation model using random walk with restart on user-item bipartite graph. Implemented algorithm is based on Pixie random walk.

References

C. Eksombatchai (2018), Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time

source

Recommenders.fit! — Method

fit!(model::Randomwalk, table; col_user = :userid, col_item = :itemid)

Build bipartite graph from table. One side of the graph collcets user nodes, and the others item nodes. If a user actions an item, an edge is added between them. The graph is undirected, and has no extra weights.

source

Recommenders.predict_i2i — Method

predict_i2i(model::Randomwalk, userid::Union{AbstractString,Int}, n::Int64; drop_history = false, terminate_prob = 0.1, total_walk_length = 10000, min_high_visited_candidates = Inf, high_visited_count_threshold = Inf, pixie_walk_length_scaling = false, aggregate_function = sum)

Make recommendation by random walk with restart. Basic algorithm is as follows:

1. Get users that are connected with the query item by one step. We denote them by ``q \in Q``.
2. Starting from each node ``q \in Q``, perform multiple random walks with certain stop probability. Record the visited count of the items on the walk. We denote the counts of item ``p`` on the walk from ``q`` by ``V_q[p]``.
3. Finally aggregate ``V_q[p]`` to ``V[p]``, and recommeds top-scored items. Two mothods for aggregation are provided
- Simple aggregation: Taking sum, ``V[p] = \sum_{q\in Q} V_q[p]``. You can also replace `sum` by, for instance, `maximum`.
- Pixie boosting: ``V[p] = (\sum_{q\in Q} \sqrt{V_q[p]})^2``, putting more importance on the nodes visited by ``q``s.

# Model-specific arguments
- `terminate_prob`: stop probability of one random walk
- `total_walk_length`: total walk length over the multiple walks from ``q``'s.
- `high_visited_count_threshold`: early stopping paramerer. Count up `high_visited_count` when the visited count of certain node reaches this threshold.
- `min_high_visited_candidates`: early stopping parameter. Terminate the walk from some node ``q`` if `high_visited_count` reaches `min_high_visited_candidates`.
- `pixie_walk_length_scaling`: If set to true, the start node ``q`` with more degree will be given more walk length. If false, the walk length is the same over all the nodes ``q \in Q``
- `pixie_multi_hit_boosting`: If true, pixie boosting is adopted for aggregation. If false, simple aggregation is used.
- `aggregate_function`: function used by simple aggregation.

source

Recommenders.predict_u2i — Method

predict_u2i(model::Randomwalk, userid::Union{AbstractString,Int}, n::Int64; drop_history = false, terminate_prob = 0.1, total_walk_length = 10000, min_high_visited_candidates = Inf, high_visited_count_threshold = Inf, pixie_walk_length_scaling = false, pixie_multi_hit_boosting = false, aggregate_function = sum)

Make recommendation by random walk with restart. Basic algorithm is as follows:

Get items that are already consumed by the user (on the graph, they are connected by one step). We denote them by $q \in Q$.
Starting from each node $q \in Q$, perform multiple random walks with certain stop probability. Record the visited count of the items on the walk. We denote the counts of item $p$ on the walk from $q$ by $V_q[p]$.
Finally aggregate $V_q[p]$ to $V[p]$, and recommeds top-scored items. Two mothods for aggregation are provided

Simple aggregation: Taking sum, $V[p] = \sum_{q\in Q} V_q[p]$. You can also replace sum by, for instance, maximum.
Pixie boosting: $V[p] = (\sum_{q\in Q} \sqrt{V_q[p]})^2$, putting more importance on the nodes visited by $q$s.

Model-specific arguments

terminate_prob: stop probability of one random walk
total_walk_length: total walk length over the multiple walks from $q$'s.
high_visited_count_threshold: early stopping paramerer. Count up high_visited_count when the visited count of certain node reaches this threshold.
min_high_visited_candidates: early stopping parameter. Terminate the walk from some node $q$ if high_visited_count reaches min_high_visited_candidates.
pixie_walk_length_scaling: If set to true, the start node $q$ with more degree will be given more walk length. If false, the walk length is the same over all the nodes $q \in Q$
pixie_multi_hit_boosting: If true, pixie boosting is adopted for aggregation. If false, simple aggregation is used.
aggregate_function: function used by simple aggregation.

source