This week i’ve been working on using Doc2Vec with CritiqueBrainz reviews, to try and get a good dimension of semantic similarity, from which to serve recommendations from a given track. So far seems to be working, sort of. For a good introduction to the workings of Doc2Vec see this post.
I’ve been working with the CritiqueBrainz JSON dumps from late 2016, this is a large corpus of reviews from which I can train my Doc2Vec model and then get a Cosine similarity between two reviews and thus 2 releases / pieces of music.
Mick has suggested to use Text-Summarisation in order to get rid of superfluous information and thus have a more focused and possibly more successful/efficient vector space for each review.