The No Free Lunch (NFL) theorem was first proposed in Computer Science by David H. Wolpert and W. Macready in a series of papers that they published between 1995 and 1997. I found it to be most easily understood in the context of Supervised Machine Learning: *“no learning algorithm has an inherent superiority over other learning algorithms for all problems.*” [1]

Or to put it simply:* **“An algorithmicist looks at no free lunch.” [2]*

Supervised Machine Learning is the task of inferring a function from labelled training data [3]. A trained model is then used to perform classical machine learning tasks on new input data such as classification or regression.

An example of an appropriate situation in which supervised learning would be useful is a simple medical diagnostic system or ‘virtual doctor’; given a set of previously diagnosed (classified) instances where patients have had said either ‘yes’ or ‘no’ to various symptoms, we could then train a simple model to learn from the examples and diagnose/classify any new patients from their existing symptoms. A simplistic and logical approach would be to use the popular ‘k nearest neighbour’ algorithm; where we quantify the difference between the new patient’s symptoms and existing examples, find our ‘nearest neighbour’ patients, then select the most frequently occurring diagnosis for our nearest neighbours, this then becomes our most likely diagnosis for our new patient. This approach however could run into some issues, what if we have a more than one most frequently occurring diagnosis? How would we select between them? Our algorithmic approach to this problem could now be seen as too simplistic, considering also we have no more training data available. Therefore we must refine our algorithm to the problem at hand… there goes our free lunch!

Some history of the premise, and then a more formal definition may help us to understand why there is no free lunch;

The Scottish philosopher Hume (1739–1740) pointed out that ‘even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’. [4]

Wolpert (1996) shows that in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori (known by deduction) distinctions between learning algorithms.” [5}

More formally, where

d= training set;

m= number of elements in training set;

f= ‘target’ input-output relationships;

h= hypothesis (the algorithm’s guess forfmade in response tod); and

C= off-training-set ‘loss’ associated withfandh(‘generalization error’)

all algorithms are equivalent, on average, by any of the following measures of risk: E(C|d), E(C|m), E(C|f,d), or E(C|f,m).How well you do is determined by how ‘aligned’ your learning algorithm P(

h|d) is with the actual posterior, P(f|d).

“Wolpert’s result, in essence, formalises Hume, extends him and calls the whole of science into question.” [5]

Personally (and others seem to agree), I think Wolpert and Macready work through colossal mathematical proof for something which seems like a common sense, something that every computer scientist should possess; the ability to investigate problems thoroughly and develop specialised, analytically derived algorithms based on a subset of existing ‘tried and proved’ methods. We must also accept that well defined problems are a simplification of reality, and we must not make assumptions that our algorithms can be generalised to solve them; in fact, the no free lunch theorem states (informally) that unless we know detail about the problem, no heuristic or algorithm performs any better than the other. [6]

An interesting outcome of the complex proof that these two masterminds lay out, for something that seems quite intuitive, is the notion that two (or more) algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. “In particular, If algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A.” [7] This suggests the need for hybridised approaches to search and optimisation.

Towards the end of ‘No Free Lunch Theorems for Search’, Wolpert & Macready begin to discuss biologically inspired algorithms in relevance to the NFL theorem which they have previously proposed. In the case of genetic algorithms and evolutionary computation in particular, where “natural selection is viewed as optimisation over a cost or ‘fitness’ function” (albeit an extremely simplified view) [7], a performance measure of such subsequent generations will not comply with the NFL theorem, as the same initial algorithm will see improved performance on the same data over time. This perhaps predicts the surge in nature inspired computational techniques applied to various fields of art and engineering, the adaptive ‘nature’ of the algorithms themselves provides much flexibility when applied to specific problems.

*“Nature Inspired Algorithms is a very active research area. This is because problems with which we are normally familiar are getting more and more complex due to size and other aspects, but also to new problems cropping up all the time on which existing methods are not effective. Nature seems to have been there and done it, so to speak.” – Abdel Salhi *

**Bibliography**

Images (by order of appearance)

http://dazuo.blogspot.co.uk/2006_02_01_archive.html

https://www.ericpetersautos.com/2013/10/24/todays-thoughts-oct-19-2013/free-lunch-1/

https://daily.jstor.org/how-do-fish-schools-work/

Text

[1] The lack of a priori distinctions between learning algorithms

[2] “On the Futility of Blind Search” J C. Culberson, 1996

[3] https://en.wikipedia.org/wiki/Supervised_learning

[4] A Treatise of Human Nature Book I, Part III:Section XII (cont.), Hume, 1738

[5] http://www.no-free-lunch.org/

[6] https://www.youtube.com/watch?v=sCePNVXSMD0&t=315s

[7] No free lunch theorems for search