ALTERNATING LEAST SQUARES (ALS) MATRIX FACTORIZATION

The Premise

The ALS model is another form of collaborative filtering, so we again used the sparse matrix of playlists by tracks. We used a version of the ALS algorithm that worked with implicit data, since using playlists to learn something about a user’s preferences and then recommending songs is based on implicit data.
The ALS model works by breaking down this matrix (playlist x track) into two rectangular matrices, playlist x f and f x track, where f is a number of latent factors that capture the characteristics of each track. These latent factors allow for easier comparisons between songs, especially because they are small compared to the number of total possible songs or playlists. However, the latent factors, like PCA components, are difficult to interpret. Regardless, all we care about is predicting songs that the user will enjoy, so accuracy would trump interpretability.
The number of latent factors, f, is a hyperparameter that we select. The ALS algorithm then initializes the two rectangular matrices with random values then iteratively finds the two matrices that minimize the loss function by changing one matrix at a time. The loss function (below) ensures that our two matrices multiply to give our initial sparse matrix.

ALS: Intro

ALS: Image

The indices u and i refer to playlists and songs, respectively. The first term, c_ui = 1 + \alpha*r_ui is the confidence in the user’s preference for a song. \alpha is a constant, and r_ui refers to the feedback parameter that corresponds to how many times a song appears in a playlist. The next term, p_ui is a measure of user preference and is simply 1 when song i appears in playlist u and 0 when not. Next, x_u and y_i refer to the two rows of the calculated rectangular matrices. Subtracting this product from p_ui is getting the difference between the actual term (p_ui) and the predicted term (x_u^Ty_i). This difference is squared and is weighted by the confidence.
The other terms are the regularization terms, scaled by the regularization parameter \lambda to prevent overfitting. We based the values of \alpha and \lambda based on the values we found in sources about using ALS to predict songs, and tested out several values of f, the number of factors.
Once we have our two rectangular matrices, the model can easily find songs that are most similar to a given song or take a playlist and return a list of songs that the model predicts could be in that playlist. The model does this by converting either playlists or songs into latent factors, generate an ordered list of songs with similar latent factors, and return the top n songs from that list as our recommendation.

ALS: Text

Model Performance

The number of factors that had the highest combined NDCG and R-precision scores was 7; this gave an NDCG score of 0.157 and an R-precision score of 0.0043. Both scores appear to roughly decrease with the number of factors.

ALS: Text

ALS: Gallery

ALS: Image

EXAMPLE PLAYLIST RECOMMENDATIONS

STARTING PLAYLIST

'1-800-273-8255', 'Feeling Whitney', 'I Fall Apart', 'My Story'

PREDICTED SONGS

'Anziety', "Buggin' Out", 'Cot Damn', 'Someone Tonight - Live Studio Demo'

ACTUAL FULL PLAYLIST

[Anziety', 'Falling In Love', 'LOVE. FEAT. ZACARI.', "Let's Get Lost", 'Perfect World'

ALS: List