In this blog post I will be further exploring GA’s, evaluating the evolutionary approach to computational problem solving; in particular the crossover and mutation operators and their resultant effects on succeeding generations (no pun intended)!
The Scenario:
Assume that a GA has chromosomes in the following structure:
In this scenario we initially have a population of four chromosomes as shown below:
ch_{1} = (6, 5, 4, 1, 3, 5, 3, 2)
ch_{2} = (8, 7, 1, 2, 6, 6, 0, 1)
ch_{3} = (2, 3, 9, 2, 1, 2, 8, 5)
ch_{4} = (4, 1, 8, 5, 2, 0, 9, 4)
Individual fitnesses, in order of best to worst:
ch_{2} = (8, 7, 1, 2, 6, 6, 0, 1) | fitness = 29
ch_{1} = (6, 5, 4, 1, 3, 5, 3, 2) | fitness = 15
ch_{4} = (4, 1, 8, 5, 2, 0, 9, 4) | fitness = -1
ch_{3} = (2, 3, 9, 2, 1, 2, 8, 5) | fitness = -2
Crossover
Chromosomal crossover is when two DNA helices break, swap a section and then rejoin.[1]
For this exercise our crossovers are as follows for subsequent generations
1st gen. Use one-point crossover (at the middle) to cross the best two chromosomes. [This would lead in creating the first two children]
2nd gen. Use two-point crossover (after x_{1} and after x_{5}) to cross the 2^{nd} and 3^{rd} best individual. [This would results in creating two extra children]
3rd gen. Use uniform crossover (for a single gene) to cross the 1^{st} and 3^{rd} best chromosomes. [Now you should have two more children; six in total]
Our optimal chromosome in this case is: (9, 9, 0, 0, 9, 9, 0, 0) | optimal fitness = 32
Is it possible to reach the optimal state without the mutation operator?
Yes, it is possible with a particular uniform crossover implementation which takes genes and places them in a random location for each splice. Albeit, this method has a small probability to succeed, depending on the amount of single gene uniform crossovers that are used for each generation. Alas, it may also take a much longer time to converge to the global optima than if we were to use a mutation operator, as the genetic diversity is somewhat limited. It would be easier if the initial chromosome population had some optimal segments, so that we converge to the optima in less time.
An example of the knapsack problem could be the following (substituting a knapsack for a car boot). Suppose we are planning a trip to a car boot sale; and we are, therefore, interested in filling the car with items that are considered valuable to sell. There are N different items; these could include a bunch of old toys, a games console, an antique chair, a CD player, and so forth. Each item type has a given set of two attributes, namely a weight (or volume) and an estimated value associated with each item. Since the boot has a limited weight (or volume) capacity, the problem of interest is to figure out how to load the boot with a combination of units of the specified types of items that yields the greatest total value.
Here I will be reporting my progress on using DFO to solve the n-knapsack problem, that is, where there are multiple bags to fill and each bag has unique constraints, though the items have constant values.
So far, i have solved the problem for 2 knapsacks and 28 items with the same capacity, but different weight constraints. See the ‘example’ problem, within the problem dataset i will be using here. This example problem hence has 28 dimensions, though a more complex solution surface than just 1 knapsack. It was solved in 500 iterations, using 100 flies and a disturbance threshold of 0.165.
I am now trying to solve the problem for 30 knapsacks and having some trouble getting close to the global optima (always getting stuck in the local optima), even when using 8000 iterations! This now seems like a good opportunity to optimise my approach. I will now try and improve the efficiency of the algorithm by reducing the dimensionality of the update phase in DFO.
Dimensionality Reduction
As the solution vector is a binary string, we could convert this to a decimal value and update the flies in one dimension rather than nItems dimensions, and then convert it back to a binary within the fitness function in order to evaluate whether it meets the constraints, and how much the solution is worth. Upon implementing this I discovered that although it speeds up convergence initially, it gets stuck in a local optima.
Fitness Penalisation Oscillation
In my most recent experiments, I have found that penalising the fitness by multiplying it by a value between 0 and 1 if it goes over the constraint threshold is much more successful. If we then begin to slowly decrease/increase this penalisation value over time, we see greatly improved results, in both function evals and best combination/value obtained. My next step is to see if I can model this in terms of an oscillator, adapting the frequency of this oscillator dynamically according to the dimensionality/variance in constraints of the particular problem set.
Update:
I’m now working on this problem in a group with a couple of my super smart peers; Francesco Perticarari and Bashar Saade. See our code on github here. We’ll hopefully be presenting a research paper with our work in the new year at ANTS and/or AISB conferences!
Thanks for reading! This is the seventh part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.
Generating recommendations that a user is most likely to appreciate is a well researched problem. In Music Recommendation, a big part of the problem is adjusting the weights of a number of features which will attempt to define a listener’s profile. Recently, Music Information Retrieval techniques (acoustic features) have further increased the dimensionality of the solution space, offering the opportunity for more personalised recommendations, whilst simultaneously increasing the need for better algorithms to ‘tune’ the feature weights to model a user’s musical personality.
Applying DFO to the optimisation problem
The Fitness Function for this task could be the how well the acoustic features map to the user access data. I.e if the generated recommendation has a high average rating by other members of the community, or if it exists in another playlist by similar users. The dataset I will be implementing the algorithm for, and evaluating the results on, will be subsets of the well researched Million Song Dataset; for example, The Echo Nest Taste Profile Subset and have a neighbourhood set of item-based similarity to draw the fitness from. Partly inspired by Inspired by the Particle swarm optimization recommender system (for movies and user profiles).
Evaluation: I will do my evaluation based on results of other item-based similarity methods for the Million Song Dataset from the Million Song Dataset Challenge.
The Goal: To utilise the collective intelligence of virtual flies, and thus allow users to effortlessly discover new music that they are very likely to enjoy!
The Combinatorial Card Game
The premise of this combinatorial problem is as follows; there are 10 cards, each labelled in ascending and unique order 1-10. The aim of the game is to get 1 unique subset of the cards to sum to 36 and the remaining subset’s product to = 360. We can treat this as a 10 dimensional problem.
The generated solution:
Params: 20 flies, 100 iterations, dt = 0.01
Subset 1: [ 9 10 2 7 8 ]
Subset 2: [ 3 6 5 4 1 ]
Function evaluations (calls to get fitness): 50
The solution can of course be any permutation of these 2 subsets.
My curiosity in permutations and concern for how efficient the algorithm is (measured in function evaluations) compared to an approach of simply iterating through the solution vector (a brute force approach) led me to evaluating solving the problem with DFO; the worst case (simply iterating through the vector and finding that answer at the very end) is approx. 10! (3628800 FE’s) and I solved it in 50 FEs which is 72576 x better than the brute force approach!
See my code on Github.
Thanks for reading! This is the sixth part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.
The Travelling Salesman Problem is a classic NP-hard problem in Computer Science, for which there are many viable solutions. The task is; given a number of cities, we must find the shortest route (in time and/or distance) that visits each city exactly once. Traditionally, the best problems have the quickest worst-case running time; in TSP, any algorithm to solve the problem increases superpolynomially (but no more than exponentially), as the number of cities increases. [1]
Let’s start by describing our evolution inspired heuristic for this particular TSP, where we will be finding the shortest route between a number of cities in the UK — in alphabetical order; Brighton, Bristol, Cambridge, Glasgow, Liverpool, London, Manchester and Oxford. We will be attempting to find the shortest distance with these approximate distances (I presume in miles);
Brighton
Bristol
Cambridge
Glasgow
Liverpool
London
Manchester
Oxford
Brighton
0
172
145
607
329
72
312
120
Bristol
0
192
494
209
158
216
92
Cambridge
0
490
237
75
205
100
Glasgow
0
286
545
296
489
Liverpool
0
421
49
208
London
0
249
75
Manchester
0
194
Oxford
0
Now on to the evolution inspired Genetic Algorithm (or GA), for which we will be using to solve the problem. The heuristic goes like this;
Initialisation – Create an initial population of solutions, usually created randomly and with an arbitrary size, from just a few to thousands. In our case this would be a population of tours; routes to each and every city.
Evaluation – Each member of the population is evaluated; the fitness of the individual is calculated by how well it fits with the desired requirements. In our case this would be the total distance, which we are trying to minimise.
Selection – Our goal is to constantly improve our populations fitness. Selection helps us to do this by discarding the bad solutions and keeping the best ones (similar to natural selection in Darwinian theory). There are a few different methods, but we aim to make it more likely that fitter individuals will be selected for the next generation. Here we will use a roulette approach. (Explained below)
Crossover – Make new individuals by combining genes of our selected individuals, (Similar to how sex works in nature). The hope is that by combining certain genomes from two or more individuals we will create an offspring with better fitness for our problem, which will inherit the best traits from each of it’s parents.
Mutation – Make new individuals by slightly altering, typically by random, the existing genome of an individual.
Repeat! Now we have the next generation, we can start again from step two until we find the fittest individual. [2]
In this particular problem we need to employ a roulette wheel to bias the algorithm to generate fitter solutions for each generation. (The selection process):
This is essentially a normalisation of the fitnesses of the current generation, so that we can select the best solutions for our next generation.
for all members of population
overallF += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / overallF)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability
then you have been selected
end for
end
create offspring
end loop
Solution and results from the GA (see the code on github)
Thus, the best is distance is 890 miles, and one of the best genotypes is (Glasgow, Liverpool, Manchester, Bristol, Oxford, Cambridge, London, Brighton), which makes sense!
On the topic of Natural Computing, Ant Colony Optimisation has also been used to solve this problem. Whereby their behaviour in finding short paths between food sources and their nest (an emergent behaviour, resulting from each ant’s preference to follow trail pheromones deposited by other ants) is modelled algorithmically. Simulated annealing has also been used and compared to GA’s here.
Lots of interesting applications for all this. In particular, I was envisioning a system for music exploration; whereby a collection of music releases could represent nodes in a graph, and the edges represent the most similar features between the items, solved bythe Travelling Salesman Problem. Moreover, the Genetic Algorithm approach may be very useful. There is very likely to be another post including this topic due to it’s potential relevance to my final project on Music Exploration and Recommendation.
Thanks for reading! This is the fifth part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.
In this post i’ll be exploring and evaluating DFO in finding points and lines of n-fold symmetry to aid Computer Vision. See my previous blog post explaining DFO here.
To define our problem to find points of n-fold symmetry, we are looking for specific areas (in this case a single pixel) where the fitness function of symmetry is at its optimum (in this case a minima). Let’s define our fitness function; we need 2 blocks of comparison, each with the same radius (the distance from the centre to any four of the blocks vertices), the sums are then evaluated around our proposed axis of symmetry, such that our symmetry metric = absolute(sum1 – sum2). In the case of a vertical axis for example:
We can see this in action here on b/w images:
Then, to improve, we change our communication so that the flies navigate around the best fly in the swarm (as opposed to our previously democratic approach),
for (int d = 0; d < Global.dim; d++) {
temp[d] = Global.fly[chosen].getPos(d) +
//random(1) * (Global.fly[chosen].getPos(d) - Global.fly[i].getPos(d));// local nei
random(1) * (Global.fly[Global.bestIndex].getPos(d) - Global.fly[i].getPos(d));
and re-initialise our agents when loading a new image:
Here we see a more stable point of symmetry, and most importantly, a solid observation; the more symmetrical the image by human perception, the more stable the position of the best fly in the swarm.
The next problem which would lead to evolving our algorithm and making it more useful is to find symmetry in nature i.e for detecting animals. Let’s see how our most recent algorithm behaves for an image of a zebra (with an updated radius that is proportional to our new image size):
Here we see some interesting behaviour straight away, around 0:13 and 0:52 there is a clear indication of the central, most obvious line of symmetry, albeit a little left of centre! This may be because the right side of our image is a little brighter than the left side, causing some confusion with our fitness function evaluation. Here might be a good place to start evolving our algorithm, then. It is wort noting that because our image is black and white, we don’t have to worry too much about our colour space, and utilising brightness values is sufficient for the next step in developing our algorithm, however, when we start to analyse more colourful animals, we’ll need to consider the best colour space to use, i.e L*a*b* (where euclidean distance is relative to perceptive difference in colour), but more on this later.
For now let’s investigate how to deal with the varying brightness issue. Whereas before we simply had all black pixels and all white pixels, we now need to account for pixels that lie on the grayscale. One way to solve this could do be to use a more balanced approach, i.e to use 4 summation boxes to compare instead of two, and use a reversed comparison for our second set. More to come!
Stochastic Diffusion Search is one of the first Swarm Intelligence algorithms, characterised as a ‘population based best-fit pattern matching algorithm’[1], first proposed by John Bishop in 1989 [2]. It reminds me most of ants foraging for food, and how they communicate distantly via pheromones to reveal promising locations, this results in a truly incredible ‘collective intelligence’ that allows the ants to efficiently locate and utilise food sources.
As a heuristic, the basic idea is that a swarm of agents are randomly initialised in a search space. They then have some kind of fitness function that determines whether they are satisfied or not satisfied, determined by their position in the search space — this is the Test phase. Then follows the Diffusion phase, which is essentially a phase of communication between the agents; in active diffusion, the successful, active agents will let an unsuccessful, inactive agent (chosen at random) know where in the search space they were successful, and then the notified agent will go there to look for neighbouring success locations. The Test and Diffusion phases are then repeated until the system has converged to an optimum location or set of locations (dependent on the problem).
When we start to solve problems based on this heuristic, an important characteristic of the algorithm arises, this is the ‘Recruitment Mode’ contained within the Diffusion, or communication phase, and it has to do with the direction and condition of the communication. Namely, there is active recruitment(explained above), passive recruitment and dual recruitment. There are also context sensitive/ context free mechanisms that can be used in the recruitment modes.
In the dual recruitment strategy, both successful, active and non-successful, inactive agents randomly select other agents. When an agent is successful, and the randomly selected agent is not, then the hypothesis of the successful agent is shared with the unsuccessful one and the agent is flagged as active. Also, if there is an agent which is inactive and disengaged in communication; and the newly selected agent is active, there will be a flow of information from the successful agent to the unsuccessful one and the unsuccessful agent is then flagged as active. Nevertheless, if there remains an agent that is neither active or engaged, a new random hypothesis is chosen for it. [3]
I have explored two separate implementations of SDS, one is a mining simulation (image in header) that is very similar to the heuristic i have described above (just replace agents with miners and success with gold!). The other is to find a word in a text document, here is an example of finding all the ‘Denmarks’ in a short passage, making use of the Context Sensitive recruitment mechanism, whereby if an active agent randomly chooses an- other active agent that maintains the same hypothesis, the selecting agent is set inactive and adopts a new random hypothesis (therefore achieving a wider exploration):
In the Context Free Mechanism, the performance is similar to context sensitive mechanism, where each active agent randomly chooses another agent. However, if the selected agent is active (irrespective of having the same hypothesis or not), the selecting agent becomes inactive and picks a new random hypothesis. ‘By the same token, this mechanism ensures that even if one or more good solutions exist, about half of the agents explore the problem space and investigate other possible solutions.’ [3]
On the subject of improving education, SDS has been utilised to predict whether students enrolled on MOOC courses will pass or fail [4]. One problem which I thought SDS would be useful is the Cold Start Problem in Recommender Systems/Collaborative Filtering; when we want to find suitable recommendations for user with little user information, we could use populate a feature space with popular items and then based on the users feature characteristics, we could engineer the swarm so that they choose the best possible items (using the fitness function e.g. rating). Evidence shows that Swarm Intelligence is ‘likely to be particularly useful when applied to problems with huge dimensionality’ [4] which is usually the case for recommendation systems!
It could also be useful to achieve the main goal of CF; to use the mining analogy, we represent each user as a ‘hill’, they may have ‘gold’ (i.e good items based on rating, or ‘goodness’ rating based on users feature weights) within their collections/purchase history. Employing SDS could be a good way of finding similar users as an alternative to using metrics like euclidean distance.
There is much more to say on SDS and it’s applications. One of the things that fascinates me is the algorithms conception, proposed as an alternative to more traditional Artificial Neural Networks to recognise patterns [1]; Neural Networks are by design, non stochastic, deterministic models, for which an alternative (i.e swarm intelligence) is often more accurate and efficient. It is worth noting that ANN’s are also nature inspired models; emulations of a simplified model, or heuristic, of the brain. Let’s keep looking to nature, then, for more amazing problem solving heuristics!
Thanks for reading! This is the third part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.
DFO is a swarm intelligence based ‘meta-heuristic’ (problem independent computational method), similar to particle swarm optimisation, but particularly inspired by and based upon flies behaviour, and their tendency to swarm around a marker or point of interest. What differentiates this particular heuristic from it’s swarm intelligence parents is it’s characteristic to be disturbed; and consequently disperse at certain points in time, before re-swarming either to the same marker, or if a better one is found upon their displacement, a new point of interest (or optimum value).
As a result of this, it’s a more specific heuristic, I would estimate that the set of search/optimisation problems this could be useful for is fewer in comparison to a more general swarm optimisation (see wiki), but on this set of problems the performance will noticeably increase most of the time, a la No Free Lunch theorem (see my previous blog post here). A particular example where this may be useful is in Neural Network weight optimisation, (see my peer/friend Leon’s excellent post here for a practical example), when we use multi-layer perceptrons the search space becomes complex (resulting in local and global optima).
In computational ‘search’ problems, this is particularly useful as it allows complex search spaces with many ‘optimal’ solutions to be fully explored for both local optima and global optima. To clarify, let’s investigate the term search space.
The most simple in 3-dimensions of search (or feature/data) space would be our parabola, or sphere function (search spaces can exist in any number of dimensions). Suppose in this instance x1 and x2 are our inputs, and the surface of the shape represents our output (or the set of all possible solutions), here the local minimum (optima) is the same as our global minimum. It can also be represented quite simply mathematically:
A nice example of a more complex search space is the ‘Hölder Table’ (which looks like waffle shaped table):
For more great looking (and also very practical ‘benchmarks’) test functions see the wiki here and more info with Matlab implementations here. (It is worth noting that it is very difficult to visualise anything above 3 dimensions!)
In DFO, first the population number (NP) of flies are initialised in random points in the feature space, according to a normal distribution. Then, the positions are updated taking three things into account, the current position, the position of the best neighbouring fly (which has the best fitness, to the left or right), and the position of the best fly (overall best fitness in the swarm). To determine the fitness, we have a ‘fitness function’ which is run once every iteration of the algorithm; the position of each fly in the feature space is evaluated against the other positions and then the ‘best’ fly is found according to its proximity to the optima (whether it is the minimum or maximum value, as defined by the problem space). The fitness function could be any mathematical function. For 2 dimensions we could simply say f(x1,x2) = (0,0) if we wanted to create a simple visualisation around a central point.
Now, to what makes this algorithm special, the dispersion threshold. To begin with we specify a value for the threshold, usually a small one! A random number (again using normal real distribution) between 0 and 1 is then generated and if the number falls below the dispersion threshold, the selected fly is assigned another random position, thus influencing it’s neighbours. This is a stochastic approach to making sure the swarm doesn’t focus on a local optima, and is forced to keep constantly searching around the space until the algorithm terminates (when the total number of evaluations is met), hopefully by then the global optima will have been found!* This heuristic hopefully provides a balance between the key exploration and exploitation characteristics of swarm intelligence algorithms; our search is more diverse due to the dispersion and therefore we have more exploration, and we are stopping the swarm from exploiting specific areas too much.
I was wondering whether it would be an interesting approach for a recommendation system, due to the larger diversity of ‘best’ solutions (FF being highest rating) which would be found within a feature space. Particle Swarm Optimisation which is similar to DFO (minus the dispersion factor) has certainly been applied in this area successfully; notably here. Another interesting application is it’s use in medical imaging – utilising the intelligent agents (flies) to extend our cognitive abilities, in particular enhancing the visual perception of radiologists when detecting possible precancerous tissue.
To conclude my initial investigation, it seems there are many things to learn from nature and bringing an element of stochasticity can help us solve complex problems effectively. Swarm intelligence is a fascinating subset of AI; many minds are better than one when we are exploring a new field, and it’s always useful to visit new areas to avoid over exploiting just one solution to a complex problem.
I will now attempt to apply DFO to a well defined problem and something which has been of great interest to me recently; Recommendation Systems and in particular, Collaborative Filtering for music. Collaborative filtering is a very popular technique used within RS, the task is to find similar users to the current user based on their profiles, and make predictions which are appealing to the current user; ‘In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc..’ [1]
I have been greatly inspired by work which applies Particle Swarm Optimisation to the task, most notably these papers on using PSO within CF; PSO RS, PSO for CF based on Fuzzy Features (for which my algorithm is mainly derived from) and for Swarm Intelligence in general applied to RS in Web Usage Mining.
Let’s define our model:
There are three phases needed to accomplish the recommendation task by using the DFO.
• User profile formation
• Neighbourhood set generation
• Predictions and recommendations
The user profile could be formed by a number of features: for this music application, I shall use the example of Discogs to inform the best features. For each release, we have the following; how many people have the item, how many people want the item, the average rating for the album (and each rating per user!) and Genres/Styles. There are numerous other features but I would argue that these are the most useful. We can develop new useful features like a Genre and Style ‘Interestingness Measure’ which can be represented more naturally by fuzzy sets (to fit better with average human perception), e.g very bad (VB), bad (B), average (AV), good (G), very good (VG), and excellent (E). When we then compute the similarities between users, we supposedly have more accurate results.
For the neighbourhood set generation; we need to find the optimum weights for every feature of our user profile — this takes into account the fact that different users place different priorities on features, for example user 1 might be mostly concerned with a genre while user 2 is mostly concerned with how highly an album is rated. for example which leads to the optimum feature space, where we have the most similar users. We could determine this by the RMSE of all the users euclidean distance (a quantitive measure of feature differences) to our current user, and perform DFO until we have the optimum solution. This would then lead to the most accurate recommendations (depending also on our process for selecting items from the users, e.g most commonly owned, highest rated etc.)
Now, for the Dispersive Flies optimisation! We have 4 main areas of concern; fly representation and initial population, the fitness function, fly behaviour and the termination condition.
Fly representation and initial population: we are mostly concerned with creating a vector (size dependent on population) of flies with parameters that represent their position, and thus the feature weights, in our n dimensional feature space.
The fitness function: This is very important. We need to evaluate the fitness score of each fly according to how well the weights map to our ideal ‘user space’ (the least RMSE distance of all users to our current user).
Fly behaviour: Each fly updates its position based on the best fly in the swarm, and it’s best neighbour. The dispersion threshold will be set to an initially small value, but this could be crucial to finding the best possible outcome in a finite amount of time when our data space is big (we could have thousands of users and millions of releases in those collections, and a high number of dimensions we will need to explore!)
The success of this approach will be down to a number of performance measures, that compares the accuracy of the algorithm and the time it takes to reach the best user space. This is likely something that will become clearer to evaluate once we have run the algorithm a number of times, allowing adaptation of the parameters (dispersion threshold, number of flies). In our particular application, a little variance between different user spaces after each run of the algorithm could be useful in finding different recommendations.
This is not a completely detailed application and I will return to this after I have explored the problem further. There are a few other approaches within the field of recommendation that swarm intelligence could be particularly useful, particularly to solve the cold start problem; when we have a new user without many releases in their collection. Can we take more advantage of swarm optimisation for feature based recommendation?
Exploring new music can be a lot like finding new, unfamiliar territory, where the reward of finding something you like in a cluster of songs you don’t know can be extremely gratifying! Perhaps we can employ more inspiration from nature; ants are the some of the best explorers on our planet, could their meta-heuristics aid our search for new music? Tune in next time to find out.
Thanks for reading!
This is the second part in a series of blogs for the Natural Computing module at Goldsmiths, University of London.