Models
Common interfaces
Recommenders.AbstractRecommender
— TypeAbstractRecommender
Abstract struct for all recommendation models.
Recommenders.evaluate_u2i
— Methodevaluate_u2i(model, train_table, test_table, metric, n; kwargs...)
Perform fit!
for model
on train_table
, predict for each user in test_table
, and evaluate by metric
.
Arguments
model::AbstractRecommender
: model to evaluate.train_table
: anyTables.jl
-compatible data for train.test_table
: anyTables.jl
-compatible data for test.metric
: evaluation metrics,MeanMetric
or collection ofMeanMetric
.n::Int64
: number of retrieved items.
Keyword arguments
drop_history::Bool
: whether to drop already consumed items from predictions.col_user
: name of user column intable
col_item
: name of item column intable
- any model-dependent arguments.
Return
Evaluated metrics for test_table
.
Recommenders.fit!
— Methodfit!(model::AbstractRecommender, table; kwargs...)
Train model
by table
.
Arguments
model
: concrete type underAbstractRecommender
table
: anyTables.jl
-compatible data for train.
Keyword arguments
col_user
: name of user column intable
col_item
: name of item column intable
- and other model-dependent arguments.
Recommenders.load_model
— Methodload_model(model::AbstractRecommender, filepath)
Load model by JLD2.
Arguments
filepath
: path from which load the model. If the model save multiple files, this argument points to directory.
Recommenders.predict_i2i
— Methodpredict_i2i(model, itemid, n; kwargs...)
Make recommendations given an item. When itemid
is a collection of raw item ids, this function performs parallel predictions by Threads.@threads
.
Arguments
model::AbstractRecommender
: trained model.itemid:
: item id to get predictions. type isAbstractString
,Int
or their collection.n::Int64
: number of retrieved items.
Keyword arguments
- other model-dependent arguments.
Return
Vector of predicted items, ordered by descending score.
Recommenders.predict_u2i
— Methodpredict_u2i(model, userid, n; kwargs...)
Make recommendations to user (or users). When userid
is a collection of raw user ids, this function performs parallel predictions by Threads.@threads
.
Arguments
model::AbstractRecommender
: trained model.userid:
: user id to get predictions. type isAbstractString
,Int
or their collection.n::Int64
: number of retrieved items.
Keyword arguments
drop_history::Bool
: whether to drop already consumed items from predictions.- and other model-dependent arguments.
Return
Vector of predicted items, ordered by descending score.
Recommenders.save_model
— Functionsave_model(model::AbstractRecommender, filepath, overwrite = false)
Save model by JLD2.
Arguments
model::AbstractRecommender
: model to save.filepath
: path to save the model. If the model save multiple files, this argument points to directory.overwrite
: whether to overwrite iffilepath
already exists.
Most Popular
Recommenders.MostPopular
— TypeMostPopular()
Non-personalized baseline model which predicts n-most popular items in the corpus.
Recommenders.fit!
— Methodfit!(model::MostPopular, table; col_user = :userid, col_item = :itemid)
Fit most popular model.
Recommenders.predict_i2i
— Methodpredict_i2i(model::MostPopular, itemid::Union{AbstractString,Int}, n::Int64)
Make n
prediction for a give item by most popular model.
Recommenders.predict_u2i
— Methodpredict_u2i(model::MostPopular, userid::Union{AbstractString,Int}, n::Int64; drop_history::Bool = false)
Make n
prediction to user by most popular model.
Item kNN
Recommenders.ItemkNN
— TypeItemkNN(k::Int64, shrink::Float64, weighting::Union{Nothing,Symbol}, weighting_at_inference::Bool, normalize::Bool, normalize_similarity::Bool)
Item-based k-nearest neighborhood algorithm with cosine similarity. The model first computes the item-to-item similarity matrix
\[s_{ij} = \frac{\bm r_i \cdot \bm r_j}{\|\bm r_i\|\|\bm r_j\| + h}\,,\]
where $r_{i,u}$ is rating for item $i$ by user $u$ and $h$ is the shrink parameter to suppress the contributions from items with a few ratings.
Constructor arguments
k
: size of the nearest neighbors. Only the k-most similar items to each item are stored, which reduces sparse similarity maxrix size, and also make better predictions.shrink
: shrink paramerer explained above.weighting
: if set to:ifidf
or:bm25
, the raw rating matrix is weighted by TF-IDF or BM25, respectively, before computing similarity. If not necessary, just setnothing
.weighting_at_inference
: to use above weighting at inference time, only relevant for BM25.normalize_similarity
: if set totrue
, normalize each column of similarity matrix. See the reference for detail.
References
M. Deshpande and G. Karypis (2004), Item-based top-N recommendation algorithms.
Recommenders.fit!
— Methodfit!(model::ItemkNN, table; col_user = :userid, col_item = :itemid, col_rating = :rating)
Fit the ItemkNN
model. col_rating
specifies rating column in the table
, which will be all unity if implicit feedback data is given.
Recommenders.predict_i2i
— Methodpredict_i2i(model::ItemkNN, itemid::Union{AbstractString,Int}, n::Int64)
Make n
prediction for a give item by ItenkNN model.
Recommenders.predict_u2i
— Methodpredict_u2i(model::ItemkNN, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)
Recommend top-n
items for user by ItemkNN
. The predicted rating of item $i$ by user $u$ is computed by
\[ \hat{r}_{i, u} = \sum_{j} s_{ij} r_{j, u}\,,\]
where $r_{j, u}$ is the actual user rating while $\hat{r}_{i, u}$ is the model prediction.
Matrix Factorization
Recommenders.ImplicitMF
— TypeImplicitMF(dim::Int64, use_bias::Bool, reg_coeff::Float64)
Matrix factorization model for implicit feedback. The predicted rating for item $i$ by user $u$ is expreseed as
\[\hat r_{ui} = \mu + b_i + b_u + \bm u_u \cdot \bm v_i\,,\]
Unlike the model for explicit feedback, the model treats all the (user, item) pairs in the train dataset as positive interaction with label 1, and sample negative (user, item) pairs from the corpus. Currently only the uniform item sampling is implemented. The fitting criteria is the ordinary logloss function
\[ L = -r_{ui}\log(\hat r_{ui}) - (1 - r_{ui})\log(1 - \hat r_{ui}).\]
Constructor arguments
dim
: dimension of user/item vectors.use_bias
: if set to false, the bias terms ($\mu$, $b_i$, $b_u$) are set to zero.reg_coeff
: $L_2$ regularization coefficients for model parameters.
References
For instance, Rendle et. al. (2020), Neural Collaborative Filtering vs. Matrix Factorization Revisited .
Recommenders.fit!
— Methodfit!(model::ImplicitMF, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)
Fit the ImplicitMF
model by stochastic grandient descent (with no batching).
Model-specific arguments
n_epochs
: number of epochs. During one epoch, all the row intable
is read once.learning_rate
: Learing rate of SGD.n_negatives
: Number of negative item samples per positive (user, item) pair.verbose
: If set to positive integer, the training info is printed once perverbose
.callbacks
: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.
Recommenders.predict_u2i
— Methodpredict_u2i(model::ImplicitMF, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)
Make predictions by using $\hat r_{ui}$.
Bayesian Personalized Ranking
Recommenders.BPR
— TypeBPR(dim::Int64, reg_coeff::Float64)
Bayesian personalized ranking model. The model evaluates user-item triplet $(u ,i ,j)$, which expresses "the user $u$ prefers item $i$ to item $j$. Here the following matrix factoriazation model is adopted to model this relation:
\[p_{uij} = \bm u_u \cdot \bm v_i - \bm u_u \cdot \bm v_j\]
Constructor arguments
dim
: dimension of user/item vectors.reg_coeff
: $L_2$ regularization coefficients for model parameters.
Recommenders.fit!
— Methodfit!(model::BPR, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)
Fit the BPR
model by stochastic grandient descent. Instead the learnBPR algorithm proposed by the original paper, the simple SGD with negative sampling is implemented.
Model-specific arguments
n_epochs
: number of epochs. During one epoch, all the row intable
is read once.learning_rate
: Learing rate of SGD.n_negatives
: Number of negative item samples per positive (user, item) pair.verbose
: If set to positive integer, the training info is printed once perverbose
.callbacks
: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.
References
Rendel et. al. (2012), BPR: Bayesian Personalized Ranking from Implicit Feedback
Recommenders.predict_u2i
— Methodpredict_u2i(model::BPR, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)
Make predictions by using $\bm u_u \cdot \bm v_i$.
Sparse Linear Machine
Recommenders.SLIM
— TypeSLIM(l1_ratio::Float64 = 0.5, λminratio::Float64 = 1e-4, k::Int = -1)
Sparse linear machine for recommendation, modified with Elastic Net loss. The prediction is made by
\[\hat r_{ui} = \sum_{j\neq i} w_{ij} r_{uj}\]
where $r_{ui}$ is the actual rating for item $i$ by user $u$, and $\hat r_{ui}$ is the predicted value. $w_{ij}$ is the model weght matrix. See Refs for algorithm details. SLIM uses Lasso.jl
for optimization.
Constructor arguments
l1_ratio
: ratio of coefficients between $L_1$ and $L_2$ losses.l1_ratio
$\to 0$ means the Ridge regularization, whilel1_ratio
$\to \infty$ the Lasso.λminratio
: parameter which governs the strength of regularization. See the docs ofLasso.jl
.k
: the nearest neighborhood size, similar toItemkNN
. Ifk
< 1, the neigoborhood size is infinity.
References
- X. Ning and G. Karypis (2011), SLIM: Sparse Linear Methods for Top-N Recommender Systems
- M. Levy (2013), Efficient Top-N Recommendation by Linear Regression
Recommenders.fit!
— Methodfit!(model::SLIM, table; col_user = :userid, col_item = :itemid, col_rating = :rating, cd_tol = 1e-7, nλ = 100)
Fit the SLIM model.
Model-specific arguments
cd_tol
: tolerance paramerer for convergence, seeLasso.jl
nλ
: length of regularization path, seeLasso.jl
Recommenders.predict_i2i
— Methodpredict_i2i(model::SLIM, itemid::Union{AbstractString,Int}, n::Int64)
Make n
prediction for a give item by SLIM model.
Recommenders.predict_u2i
— Methodpredict_u2i(model::SLIM, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)
Make predictions by SLIM model.
Random Walk
Recommenders.Randomwalk
— TypeRandomwalk()
Recommendation model using random walk with restart on user-item bipartite graph. Implemented algorithm is based on Pixie random walk.
References
C. Eksombatchai (2018), Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Recommenders.fit!
— Methodfit!(model::Randomwalk, table; col_user = :userid, col_item = :itemid)
Build bipartite graph from table
. One side of the graph collcets user nodes, and the others item nodes. If a user actions an item, an edge is added between them. The graph is undirected, and has no extra weights.
Recommenders.predict_i2i
— Methodpredict_i2i(model::Randomwalk, userid::Union{AbstractString,Int}, n::Int64; drop_history = false, terminate_prob = 0.1, total_walk_length = 10000, min_high_visited_candidates = Inf, high_visited_count_threshold = Inf, pixie_walk_length_scaling = false, aggregate_function = sum)
Make recommendation by random walk with restart. Basic algorithm is as follows:
1. Get users that are connected with the query item by one step. We denote them by ``q \in Q``.
2. Starting from each node ``q \in Q``, perform multiple random walks with certain stop probability. Record the visited count of the items on the walk. We denote the counts of item ``p`` on the walk from ``q`` by ``V_q[p]``.
3. Finally aggregate ``V_q[p]`` to ``V[p]``, and recommeds top-scored items. Two mothods for aggregation are provided
- Simple aggregation: Taking sum, ``V[p] = \sum_{q\in Q} V_q[p]``. You can also replace `sum` by, for instance, `maximum`.
- Pixie boosting: ``V[p] = (\sum_{q\in Q} \sqrt{V_q[p]})^2``, putting more importance on the nodes visited by ``q``s.
# Model-specific arguments
- `terminate_prob`: stop probability of one random walk
- `total_walk_length`: total walk length over the multiple walks from ``q``'s.
- `high_visited_count_threshold`: early stopping paramerer. Count up `high_visited_count` when the visited count of certain node reaches this threshold.
- `min_high_visited_candidates`: early stopping parameter. Terminate the walk from some node ``q`` if `high_visited_count` reaches `min_high_visited_candidates`.
- `pixie_walk_length_scaling`: If set to true, the start node ``q`` with more degree will be given more walk length. If false, the walk length is the same over all the nodes ``q \in Q``
- `pixie_multi_hit_boosting`: If true, pixie boosting is adopted for aggregation. If false, simple aggregation is used.
- `aggregate_function`: function used by simple aggregation.
Recommenders.predict_u2i
— Methodpredict_u2i(model::Randomwalk, userid::Union{AbstractString,Int}, n::Int64; drop_history = false, terminate_prob = 0.1, total_walk_length = 10000, min_high_visited_candidates = Inf, high_visited_count_threshold = Inf, pixie_walk_length_scaling = false, pixie_multi_hit_boosting = false, aggregate_function = sum)
Make recommendation by random walk with restart. Basic algorithm is as follows:
- Get items that are already consumed by the user (on the graph, they are connected by one step). We denote them by $q \in Q$.
- Starting from each node $q \in Q$, perform multiple random walks with certain stop probability. Record the visited count of the items on the walk. We denote the counts of item $p$ on the walk from $q$ by $V_q[p]$.
- Finally aggregate $V_q[p]$ to $V[p]$, and recommeds top-scored items. Two mothods for aggregation are provided
- Simple aggregation: Taking sum, $V[p] = \sum_{q\in Q} V_q[p]$. You can also replace
sum
by, for instance,maximum
. - Pixie boosting: $V[p] = (\sum_{q\in Q} \sqrt{V_q[p]})^2$, putting more importance on the nodes visited by $q$s.
Model-specific arguments
terminate_prob
: stop probability of one random walktotal_walk_length
: total walk length over the multiple walks from $q$'s.high_visited_count_threshold
: early stopping paramerer. Count uphigh_visited_count
when the visited count of certain node reaches this threshold.min_high_visited_candidates
: early stopping parameter. Terminate the walk from some node $q$ ifhigh_visited_count
reachesmin_high_visited_candidates
.pixie_walk_length_scaling
: If set to true, the start node $q$ with more degree will be given more walk length. If false, the walk length is the same over all the nodes $q \in Q$pixie_multi_hit_boosting
: If true, pixie boosting is adopted for aggregation. If false, simple aggregation is used.aggregate_function
: function used by simple aggregation.