Models

Common interfaces

Recommenders.evaluate_u2iMethod
evaluate_u2i(model, train_table, test_table, metric, n; kwargs...)

Perform fit! for model on train_table, predict for each user in test_table, and evaluate by metric.

Arguments

  • model::AbstractRecommender: model to evaluate.
  • train_table: any Tables.jl-compatible data for train.
  • test_table: any Tables.jl-compatible data for test.
  • metric: evaluation metrics, MeanMetric or collection of MeanMetric.
  • n::Int64: number of retrieved items.

Keyword arguments

  • drop_history::Bool: whether to drop already consumed items from predictions.
  • col_user: name of user column in table
  • col_item: name of item column in table
  • any model-dependent arguments.

Return

Evaluated metrics for test_table.

source
Recommenders.fit!Method
fit!(model::AbstractRecommender, table; kwargs...)

Train model by table.

Arguments

  • model: concrete type under AbstractRecommender
  • table: any Tables.jl-compatible data for train.

Keyword arguments

  • col_user: name of user column in table
  • col_item: name of item column in table
  • and other model-dependent arguments.
source
Recommenders.predict_u2iMethod
predict_u2i(model, userid, n; kwargs...)

Make recommendations to user (or users). When userid is a collection of raw user ids, this function performs parallel predictions by Threads.@threads.

Arguments

  • model::AbstractRecommender: trained model.
  • userid:: user id to get predictions. type is AbstractString, Int or their collection.
  • n::Int64: number of retrieved items.

Keyword arguments

  • drop_history::Bool: whether to drop already consumed items from predictions.
  • and other model-dependent arguments.

Return

Vector of predicted items, ordered by descending score.

source
Recommenders.fit!Method
fit!(model::MostPopular, table; col_user = :userid, col_item = :itemid)

Fit most popular model.

source
Recommenders.predict_u2iMethod
predict_u2i(model::MostPopular, userid::Union{AbstractString,Int}, n::Int64; drop_history::Bool = false)

Make n prediction to user by most popular model.

source

Item kNN

Recommenders.ItemkNNType
ItemkNN(k::Int64, shrink::Float64, weighting::Union{Nothing,Symbol}, weighting_at_inference::Bool, normalize::Bool, normalize_similarity::Bool)

Item-based k-nearest neighborhood algorithm with cosine similarity. The model first computes the item-to-item similarity matrix

\[s_{ij} = \frac{\bm r_i \cdot \bm r_j}{\|\bm r_i\|\|\bm r_j\| + h}\,,\]

where $r_{i,u}$ is rating for item $i$ by user $u$ and $h$ is the shrink parameter to suppress the contributions from items with a few ratings.

Constructor arguments

  • k: size of the nearest neighbors. Only the k-most similar items to each item are stored, which reduces sparse similarity maxrix size, and also make better predictions.
  • shrink: shrink paramerer explained above.
  • weighting: if set to :ifidf or :bm25, the raw rating matrix is weighted by TF-IDF or BM25, respectively, before computing similarity. If not necessary, just set nothing.
  • weighting_at_inference: to use above weighting at inference time, only relevant for BM25.
  • normalize_similarity: if set to true, normalize each column of similarity matrix. See the reference for detail.

References

M. Deshpande and G. Karypis (2004), Item-based top-N recommendation algorithms.

source
Recommenders.fit!Method
fit!(model::ItemkNN, table; col_user = :userid, col_item = :itemid, col_rating = :rating)

Fit the ItemkNN model. col_rating specifies rating column in the table, which will be all unity if implicit feedback data is given.

source
Recommenders.predict_u2iMethod
predict_u2i(model::ItemkNN, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Recommend top-n items for user by ItemkNN. The predicted rating of item $i$ by user $u$ is computed by

\[ \hat{r}_{i, u} = \sum_{j} s_{ij} r_{j, u}\,,\]

where $r_{j, u}$ is the actual user rating while $\hat{r}_{i, u}$ is the model prediction.

source

Matrix Factorization

Recommenders.ImplicitMFType
ImplicitMF(dim::Int64, use_bias::Bool, reg_coeff::Float64)

Matrix factorization model for implicit feedback. The predicted rating for item $i$ by user $u$ is expreseed as

\[\hat r_{ui} = \mu + b_i + b_u + \bm u_u \cdot \bm v_i\,,\]

Unlike the model for explicit feedback, the model treats all the (user, item) pairs in the train dataset as positive interaction with label 1, and sample negative (user, item) pairs from the corpus. Currently only the uniform item sampling is implemented. The fitting criteria is the ordinary logloss function

\[ L = -r_{ui}\log(\hat r_{ui}) - (1 - r_{ui})\log(1 - \hat r_{ui}).\]

Constructor arguments

  • dim: dimension of user/item vectors.
  • use_bias: if set to false, the bias terms ($\mu$, $b_i$, $b_u$) are set to zero.
  • reg_coeff: $L_2$ regularization coefficients for model parameters.

References

For instance, Rendle et. al. (2020), Neural Collaborative Filtering vs. Matrix Factorization Revisited .

source
Recommenders.fit!Method
fit!(model::ImplicitMF, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)

Fit the ImplicitMF model by stochastic grandient descent (with no batching).

Model-specific arguments

  • n_epochs: number of epochs. During one epoch, all the row in table is read once.
  • learning_rate: Learing rate of SGD.
  • n_negatives: Number of negative item samples per positive (user, item) pair.
  • verbose: If set to positive integer, the training info is printed once per verbose.
  • callbacks: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.
source
Recommenders.predict_u2iMethod
predict_u2i(model::ImplicitMF, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by using $\hat r_{ui}$.

source

Bayesian Personalized Ranking

Recommenders.BPRType
BPR(dim::Int64, reg_coeff::Float64)

Bayesian personalized ranking model. The model evaluates user-item triplet $(u ,i ,j)$, which expresses "the user $u$ prefers item $i$ to item $j$. Here the following matrix factoriazation model is adopted to model this relation:

\[p_{uij} = \bm u_u \cdot \bm v_i - \bm u_u \cdot \bm v_j\]

Constructor arguments

  • dim: dimension of user/item vectors.
  • reg_coeff: $L_2$ regularization coefficients for model parameters.
source
Recommenders.fit!Method
fit!(model::BPR, table; callbacks = Any[], col_user = :userid, col_item = :item_id, n_epochs = 2, learning_rate = 0.01, n_negatives = 1, verbose = -1)

Fit the BPR model by stochastic grandient descent. Instead the learnBPR algorithm proposed by the original paper, the simple SGD with negative sampling is implemented.

Model-specific arguments

  • n_epochs: number of epochs. During one epoch, all the row in table is read once.
  • learning_rate: Learing rate of SGD.
  • n_negatives: Number of negative item samples per positive (user, item) pair.
  • verbose: If set to positive integer, the training info is printed once per verbose.
  • callbacks: Additional callback functions during SGD. One can implement, for instance, monitoring the validation metrics and the early stopping. See Callbacks.

References

Rendel et. al. (2012), BPR: Bayesian Personalized Ranking from Implicit Feedback

source
Recommenders.predict_u2iMethod
predict_u2i(model::BPR, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by using $\bm u_u \cdot \bm v_i$.

source

Sparse Linear Machine

Recommenders.SLIMType
SLIM(l1_ratio::Float64 = 0.5, λminratio::Float64 = 1e-4, k::Int = -1)

Sparse linear machine for recommendation, modified with Elastic Net loss. The prediction is made by

\[\hat r_{ui} = \sum_{j\neq i} w_{ij} r_{uj}\]

where $r_{ui}$ is the actual rating for item $i$ by user $u$, and $\hat r_{ui}$ is the predicted value. $w_{ij}$ is the model weght matrix. See Refs for algorithm details. SLIM uses Lasso.jl for optimization.

Constructor arguments

  • l1_ratio: ratio of coefficients between $L_1$ and $L_2$ losses. l1_ratio $\to 0$ means the Ridge regularization, while l1_ratio $\to \infty$ the Lasso.
  • λminratio: parameter which governs the strength of regularization. See the docs of Lasso.jl.
  • k: the nearest neighborhood size, similar to ItemkNN. If k < 1, the neigoborhood size is infinity.

References

source
Recommenders.fit!Method
fit!(model::SLIM, table; col_user = :userid, col_item = :itemid, col_rating = :rating, cd_tol = 1e-7, nλ = 100)

Fit the SLIM model.

Model-specific arguments

  • cd_tol: tolerance paramerer for convergence, see Lasso.jl
  • : length of regularization path, see Lasso.jl
source
Recommenders.predict_u2iMethod
predict_u2i(model::SLIM, userid::Union{AbstractString,Int}, n::Int64; drop_history = false)

Make predictions by SLIM model.

source

Random Walk

Recommenders.fit!Method
fit!(model::Randomwalk, table; col_user = :userid, col_item = :itemid)

Build bipartite graph from table. One side of the graph collcets user nodes, and the others item nodes. If a user actions an item, an edge is added between them. The graph is undirected, and has no extra weights.

source
Recommenders.predict_u2iMethod
predict_u2i(model::Randomwalk, userid::Union{AbstractString,Int}, n::Int64; drop_history = false, terminate_prob = 0.1, total_walk_length = 10000, min_high_visited_candidates = Inf, high_visited_count_threshold = Inf, pixie_walk_length_scaling = false, pixie_multi_hit_boosting = false, aggregate_function = sum)

Make recommendation by random walk with restart. Basic algorithm is as follows:

  1. Get items that are already consumed by the user (on the graph, they are connected by one step). We denote them by $q \in Q$.
  2. Starting from each node $q \in Q$, perform multiple random walks with certain stop probability. Record the visited count of the items on the walk. We denote the counts of item $p$ on the walk from $q$ by $V_q[p]$.
  3. Finally aggregate $V_q[p]$ to $V[p]$, and recommeds top-scored items. Two mothods for aggregation are provided
  • Simple aggregation: Taking sum, $V[p] = \sum_{q\in Q} V_q[p]$. You can also replace sum by, for instance, maximum.
  • Pixie boosting: $V[p] = (\sum_{q\in Q} \sqrt{V_q[p]})^2$, putting more importance on the nodes visited by $q$s.

Model-specific arguments

  • terminate_prob: stop probability of one random walk
  • total_walk_length: total walk length over the multiple walks from $q$'s.
  • high_visited_count_threshold: early stopping paramerer. Count up high_visited_count when the visited count of certain node reaches this threshold.
  • min_high_visited_candidates: early stopping parameter. Terminate the walk from some node $q$ if high_visited_count reaches min_high_visited_candidates.
  • pixie_walk_length_scaling: If set to true, the start node $q$ with more degree will be given more walk length. If false, the walk length is the same over all the nodes $q \in Q$
  • pixie_multi_hit_boosting: If true, pixie boosting is adopted for aggregation. If false, simple aggregation is used.
  • aggregate_function: function used by simple aggregation.
source