Major Project Chapter 4 – Reaching the Docker

Last week, I had fun learning about the package manager – Docker, whilst trying to access the CritiqueBrainz database in order to get reviews for my Doc2Vec model, I hacked about for 2 days unsuccessfully, only to be sent the JSON dumps for the reviews, so now on to that. I realise that I was probably not the only one finding the Documentation a bit cryptic and hard to understand, hence the pull request which the developers opened a day prior to my work.

This week i’ll be focusing on training and testing my Doc2Vec model with the JSON review data for the first iteration of the MRS.




Major Project Chapter 3 – Igor’s Sister: 256

This week I have been following this tutorial which Mick recommended to set up my own server on the University’s network. I have been successful in setting up my own stack on top of Ubuntu 16 and serving HTML via Python and MongoDB, this is all very exciting. When I run the server via terminal, it can be found here.

The next step is to fill up a Mongo database with music releases from CritiqueBrainz. This way I can utilise cosine similarity of the reviews using a pre-trained model (with rich semantic knowledge), and thus create a good one-dimensional semantic data space, and can then begin to serve recommendations based on distance in this dimension (a line of similarity). In the future, if this method proves successful, I can enrich the amount of data available by utilising Beautiful Soup to scrape Allmusic reviews.

And so concludes Chapter 2 of our story. We will now travel to other worlds and dimensions on our Python.





Major Project Chapter 1 – The Adventures of Cron and Igor

Meeting with supervisor Mick #1: Get python processes running on Igor automatically. This will be the goal for this week.

For my 2nd blog post I’ll be telling the tale of Cron from Unix, and Igor from Goldsmiths, and the mystery of how they will work together communicating in the language of the Python.

Cron and crontab

The software utility cron is a time-based job scheduler in Unix-like OS [1]. I’ll be using it for batch scheduling in the back-end of my MRS. Essentially what I need it to do is make GET requests to an API backend like Rap Genius, so that I can process the data and fill a database with cosine similarity from Doc2Vec so I can start getting meaningful distances in a 1 dimensional semantic data space.

I have managed to get cron scheduling python scripts on my Igor user space like so:

[jkirt001@igor ~]$ crontab -l

*/10 * * * * /usr/bin/python ~/public_html/MajorProj/

For more information on Cron see the useful quick reference guide here [2].

Mick meeting #3: Create own web server on igor and start serving HTML with python.




BSc Major Project in Music Computing – Plan of Action

Welcome, this is my 1st blog post for my BSc Major Project. I’ll be outlining my aims and plan of action for the project, entitled: Data & AI in Music Recommendation, Discovery and Exploration.

I have been researching popular applications such as Spotify, Youtube and Discogs to consider the state of the art approaches to AI/Machine Learning based Music Recommendation Systems (MRS) and assisted music exploration. I have also been researching new and – in my opinion – underused methods in this field; such as Swarm Intelligence, to consider how I can offer a competitive alternative to the current state of the art systems and add to the current research. [1,2,3]

There has been a great deal of prior research in MRS; Schedl et al., in their paper “Current Challenges and Visions in Music Recommender Systems Research”, states that: “such systems are still far from being perfect and frequently produce unsatisfactory recommendations. This is partly because of the fact that users’ tastes and musical needs are highly dependent on a multitude of factors” [1]. I plan to address the issue of personalisation by using algorithms and methods that have a good social and psychological basis. I am well aware that the variance in users tastes of large statistical significance, therefore I will further evaluate my approach by iterative user testing– after implementations and alterations of the core recommendation system. One of my core questions being: what defines a ‘good’ recommendation? For my intents and purposes; I will define and thus evaluate the success of the algorithm by the ratings of my users for the generated recommendations, with a 5 point ‘star’ rating system.

I propose to create an MRS, and furthermore, an explorative web application; that can be used by music enthusiasts, DJs and producers to find new, personalised recommendations. Further more, there is an opportunity to create a UI where the user can explore further the resultant data space from the MRS. 

“What do you mean ‘resultant data space’?”

Dimensionality reduction needs to be applied in order to successfully create a meaningful distance, whereby the closest items are the best recommendations from the current item. Simultaneously, this is very useful for the latter part of my project (if I have time)– whereby I will be finding the shortest paths between 2 tracks and visualising the resultant path to the user.

In the multi-dimensional hybrid data space (weighted compound features based on metadata, user data and possibly acoustic data). MDS/TSNEE/UMAP will be tested out to see which performs best based on the quality of the recommendations. See Leon’s blog post for more details (comparing UMAP, t-SNE and PCA).

One of the most important reasons for the reduction in a large number of dimensions is the so called ‘Curse of Dimensionality’, in summary: when the dimensionality increases, the volume of the space increases su h that the available data become sparse. And therefore these Euclidean distances that I will be relying on will become harder to deal with. A large portion of the work that needs to be done here is also experimenting with feature weighting (which is largely dependent on user preferences) such that distances between tracks are most meaningful to that user.

Semantic Analysis using Doc2Vec/Word2Vec

I plan to create a small set of features based on semantics, for this I will be using doc2vec/word2vec to extract a similarity measure between 2 tracks qualitative, semantic information. For this, I will use lyric websites like Rapgenius and a review website such as Allmusic/MusicBrainz; for which I can scrape the text using Beautiful Soup, or obtain it through the APIs. This will create a great way of describing where songs fit into a low dimensional, semantic data space. Most probably an efficient feature engineering method, as opposed to using multidimensional acoustic data space. With enough time however I will experiment with high level acoustic features (e.g Danceability, Tonality) to include into the overarching compound feature/data space.

Running the System on a Server

To execute theses processes I will be running Python as scripts on a server, and serve the results in HTML to be analysed and visualised. I will be working for the duration of this project on my University’s server: Igor. This saves time and money over setting up AWS and such related services.


[1]  M. Schedl, H. Zamani, C. Chen, Y. Deljdoo, M. Elahi, “Current Challenges and Visions in Music Recommender Systems Research”, Conference 2017, Washington, DC, USA.

[2] P. Covington, J. Adams, E. Sargin, “Deep Neural Networks for YouTube Recommendations”, in Proceedings of the 10th ACM Conference on Recommender Systems – RecSys ’16 (ACM Press, New York, New York, USA, 2016;, pp. 191–198


Ant Colony Optimisation in Music Exploration and Natural Computing Jokes

In this post I will be exploring ACO in relation to a particular application of the Travelling Salesman Problem, for music exploration between items in an existing collection, and items that have been recommended by a system.

The Task:

A user is presented with items in their collection, and can immediately see the recommendations in relevant distance to items in their collection. Then, upon browsing to a new item, a new graph/map is visualised, showing the most important features as relevant to the user preferences (heavily weighted features being graphed) and displaying a network of music releases as nodes, with edges displaying the name and possibly value of the features which connect the recommendation or collection item set of items in-between.

The starting node would be an existing item in a users collection, and the final destination is a recommended item of the users choice.

Thus, a visual data space is presented, to which a user can explore and learn about new music, ‘joining the dots’ between music releases. This, in essence, takes part of the Machine Learning algorithm out of the ‘black-box’ and presents it alongside the results to the user. There could also be an opportunity to implement an evaluation system for the RecSys, where users can say if they like or dislike a recommendation, or further; if they like or dislike how the recommendation has been made, thus allowing the Machine Learning algorithm to improve and adapt accordingly.  A large part of work for this task will be finding the best way of visualising the multidimensional feature space in 2 or 3 dimensions. Dimensionality reduction techniques such as TSNEE/MDS/UMAP will be very useful for this. I would like to incorporate release artworks for each item in the visual space, which I believe will add to the immersivity and attractiveness of the application.

Thus, we can see the paths that the ants, have taken, via the deposition of their pheromones, and parameterise their exploration based on a users input!

The Final Blog Post

As this is my final blog post for the Natural Computing module, I will be evaluating what I’ve learned and what I would still like to learn in the topic which was not covered in class. The best things I’ve learned would likely be the wonders of dimensionality, optimisation and simulated collective intelligence. I am a scientist at the core, and the content has been very intellectually stimulating to me.

I would like to learn more about Natural Computing techniques as applied in the field of Machine Learning, how they interact with Data Science and compare with widely popular models like Neural Networks. It has also come to my attention that Neural Networks are an attempt at simulating a natural model of cognition, and therefore could fit under the umbrella of Natural Computing. As humanity delves further into the depths of Artificial Intelligence, we will with no doubt draw from nature as a bountiful, seemingly endless and wonderful source of inspiration.

And now on to some Natural Computing jokes:

  1. A fly walks into a bar. It checks if its best neighbour also walked into that bar. If so, all the flies drink at that bar. All the humans leave.
  2. What kind of ants are very learned? Pedants!
  3. What do you call an ant who can’t speak? A mutant (mute ant).
  4. How can you better understand genetics in cold weather? Put your codon!     source:
  5. Why did the chicken cross the road? Darwin: ‘It was the logical next step after coming down from the trees. He was the fittest chicken.’

Thanks for reading! This is the ninth and final part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.


An Exercise in Intuition with Genetic Algorithms

In this blog post I will be further exploring GA’s, evaluating the evolutionary approach to computational problem solving; in particular the crossover and mutation operators and their resultant effects on succeeding generations (no pun intended)!

The Scenario:

Assume that a GA has chromosomes in the following structure:

ch = (x0, x1, x2, x3, x4, x5, x6, x7)

x0-7 can be any digit between zero to nine.
The fitness of each chromosome is calculated using the following formula:

f(ch) = (x0 + x1) − (x2 + x3) + (x4 + x5 ) − (x6 + x7)

This problem is a maximisation problem.

In this scenario we initially have a population of four chromosomes as shown below:

  • ch1 = (6, 5, 4, 1, 3, 5, 3, 2)

  • ch2 = (8, 7, 1, 2, 6, 6, 0, 1)

  • ch3 = (2, 3, 9, 2, 1, 2, 8, 5)

  • ch4 = (4, 1, 8, 5, 2, 0, 9, 4)

Individual fitnesses, in order of best to worst:

  1. ch2 = (8, 7, 1, 2, 6, 6, 0, 1) | fitness = 29
  2. ch1 = (6, 5, 4, 1, 3, 5, 3, 2) | fitness = 15
  3. ch4 = (4, 1, 8, 5, 2, 0, 9, 4) | fitness = -1
  4. ch3 = (2, 3, 9, 2, 1, 2, 8, 5) | fitness = -2


Chromosomal crossover is when two DNA helices break, swap a section and then rejoin.[1]

For this exercise our crossovers are as follows for subsequent generations

1st gen. Use one-point crossover (at the middle) to cross the best two chromosomes. [This would lead in creating the first two children]

2nd gen. Use two-point crossover (after x1 and after x5) to cross the 2nd and 3rd best individual. [This would results in creating two extra children]

3rd gen. Use uniform crossover (for a single gene) to cross the 1st and 3rd best chromosomes. [Now you should have two more children; six in total]

New child chromosomes over all generations:

c1 = [6 5 4 1 6 6 0 1] | fitness = 20
c2 = [8 7 1 2 3 5 3 2] | fitness = 24
c3 = [8 5 4 1 6 5 3 2] | fitness = 29
c4 = [6 7 1 2 3 6 0 1] | fitness = 23

Our optimal chromosome in this case is: (9, 9, 0, 0, 9, 9, 0, 0) | optimal fitness = 32

Is it possible to reach the optimal state without the mutation operator?

Yes, it is possible with a particular uniform crossover implementation which takes genes and places them in a random location for each splice. Albeit, this method has a small probability to succeed, depending on the amount of single gene uniform crossovers that are used for each generation. Alas, it may also take a much longer time to converge to the global optima than if we were to use a mutation operator, as the genetic diversity is somewhat limited. It would be easier if the initial chromosome population had some optimal segments, so that we converge to the optima in less time.

See the experiments:

Thanks for reading! This is the eighth part in a series of  blogs for the Natural Computing module at Goldsmiths, University of London.

Using Dispersive Flies Optimisation to solve the Knapsack problem

The Knapsack problem is a classical problem in combinatorial optimisation, it has many applications including finding the least wasteful way to cut raw materials, and other economic problems.

An example of the knapsack problem could be the following (substituting a knapsack for a car boot). Suppose we are planning a trip to a car boot sale; and we are, therefore, interested in filling the car with items that are considered valuable to sell. There are N different items; these could include a bunch of old toys, a games console, an antique chair, a CD player, and so forth. Each item type has a given set of two attributes, namely a weight (or volume) and an estimated value associated with each item. Since the boot has a limited weight (or volume) capacity, the problem of interest is to figure out how to load the boot with a combination of units of the specified types of items that yields the greatest total value.

car boot

Here I will be reporting my progress on using DFO to solve the n-knapsack problem, that is, where there are multiple bags to fill and each bag has unique constraints, though the items have constant values.

So far, i have solved the problem for 2 knapsacks and 28 items with the same capacity, but different weight constraints. See the ‘example’ problem, within the problem dataset i will be using here. This example problem hence has 28 dimensions, though a more complex solution surface than just 1 knapsack. It was solved in 500 iterations, using 100 flies and a disturbance threshold of 0.165.

I am now trying to solve the problem for 30 knapsacks and having some trouble getting close to the global optima (always getting stuck in the local optima), even when using 8000 iterations! This now seems like a good opportunity to optimise my approach. I will now try and improve the efficiency of the algorithm by reducing the dimensionality of the update phase in DFO.

Dimensionality Reduction

As the solution vector is a binary string, we could convert this to a decimal value and update the flies in one dimension rather than nItems dimensions, and then convert it back to a binary within the fitness function in order to evaluate whether it meets the constraints, and how much the solution is worth. Upon implementing this I discovered that although it speeds up convergence initially, it gets stuck in a local optima.

Fitness Penalisation Oscillation

In my most recent experiments, I have found that penalising the fitness by multiplying it by a value between 0 and 1 if it goes over the constraint threshold is much more successful. If we then begin to slowly decrease/increase this penalisation value over time, we see greatly improved results, in both function evals and best combination/value obtained. My next step is to see if I can model this in terms of an oscillator, adapting the frequency of this oscillator dynamically according to the dimensionality/variance in constraints of the particular problem set.


I’m now working on this problem in a group with a couple of my super smart peers; Francesco Perticarari and Bashar Saade. See our code on github here. We’ll hopefully be presenting a research paper with our work in the new year at ANTS and/or AISB conferences!

Thanks for reading! This is the seventh part in a series of  blogs for the Natural Computing module at Goldsmiths, University of London.