.

 

 

 

LECTURE

Information Theory and Entropy: their Relevance to Philosophy

Chaired by Victor Suchar

Dr. Donald Cameron

11 February 2004

Entropy is the name given to a quantitative measure of information and also to a quantity defining the ability of gases to yield mechanical work in heat engines. The analogy is valid and useful, but must not be taken outside its limits. The concepts of information, order, redundancy, noise, selection, Maxwell’s Demon and the evolution of memes can be of great service to philosophy.

Information is a word that we use a great deal and we all have a pretty good concept of what we mean by it. It comes in quantities – we sometimes speak of a lot of information or very little, although most people have not gone so far as to ask in what units it might be measured. It has the strange property that you can give it away, but still keep it for yourself.

It is generally thought that information has some value. I remember one young graduate seeking to prove that greater knowledge was not sufficiently recognised in his rate of pay. He reasoned as follows:

Equation 1: Knowledge is power

Equation 2: Time is money

Equation 3: power = work/time

Substituting from equations 1 and 2 into equation 3, he obtained

Knowledge = work/money, which can be rearranged as

Money = work/knowledge.

Thus money is directly proportional to work (as you might expect) but, surprisingly, is inversely proportional to knowledge.

Now this conclusion, like much published philosophy, is complete rubbish, because the reasoning lacks precision, but at least in this case its author was happy to admit that it was rubbish. In the 1950’s CP Snow pointed out to members of the arts faculties that knowledge of the second law of thermodynamics was a prerequisite to any claim to a complete education. Sadly, much of the erudite-sounding talk about information theory and entropy that has been heard since is no better than the reasoning of the young graduate about his pay.

Let us begin with a short course in information theory – I promise to use only the simplest of mathematics. Think of information passing in a channel from a transmitter to a receiver. If a binary digit, a 0 or a 1 can be sent, then the receiver, who previously thought either result equally probable, will now know which one of the two possibilities it is. If a letter of the alphabet is sent, the receiver would know which out of 26 possibilities it is. Clearly an alphabetic letter gives more information than a binary digit. Perhaps we could say that the measure is 26 for one and 2 for the other.

But now suppose that we receive a second letter. The number of possible two-letter sequences is 26x26 or 676. This doesn’t seem to be working, because common sense tells us that reception of two letters should be twice as much information as one. Our measure should add, not multiply. Logarithms have the property that log(x)+log(y)=log(xy) – in effect they transform a multiplication operation into an addition. As far as I can remember, logs provide the only function that will do that. Using the logarithm of the number of possibilities, we now have a measure that describes the information received, and it is additive. There must be few of us left who can remember seriously using log tables to perform multiplications before the age of the electronic calculator, or using a slide rule, which is based on the same principle – its graduations are to a log scale and its distances can be added to give multiplication.

Using this measure, the information content of a binary digit is log(2) and that of an alphabetic letter is log(26) and for any other signal it is log(1/p) where p is the probability at the receiver of receiving that signal before it arrives. It turns out that this measure of information is very useful and it is used by communications and software engineers to good effect. This is the quantity that has been called ‘entropy’ of which more later.

Usually, in information theory, the logarithms are taken to the base 2 – then, since log2(2) will equal 1, a binary digit will carry one unit of information. This is called one ‘bit’, but using our formula, every other kind of information can be measured in bits. For example, for an alphabetic letter, log2(26) = 4.7. This tells us that four binary digits would not be enough to code a letter, but it could be done with five with room to spare. Let’s check this out: four digits gives 2x2x2x2=16 possibilities – not enough for a 26 letter alphabet, whereas another digit multiplies by two again giving 32 possibilities, enough for the 26 letters and a few other symbols.

This very basic introduction to information theory does not give its full scope. In fact every kind of information can be measured in bits by extending the same basic principle – continuous wave forms, pictures and whatever, but you will be relieved that it is beyond the scope of my talk this evening to go through the mathematics of that in detail. But there is one further bit of number work I would like to explore before abandoning the mathematical approach.

We have noted that the quantity of information transferred is dependent on the prior probability of each symbol at the receiver. If we have a system that sends 0s and 1s, but with a probability of 0.8 for 0 and 0.2 for 1, the information conveyed by each symbol is different. Receipt of a 1 gives log2(1/0.2) = 2.322 bits of information, because 1 is a more unusual event, whereas 0 works out to be log2(1/0.8) = 0.322 bits, because it is relatively common. This accords well with common sense. We would feel that we had received a larger amount of information, if we were told that there was a crocodile on the lawn than if we learned that there was a cat.

For this system, the average information transmitted per symbol is

0.8 log2(1/0.8)+0.2 log2(1/0.2) = 0.722 bits

instead of 1 bit per symbol when the 0s and 1s are equally probable. This is an important result of information theory. When the symbols are not equally probable and the recipient can guess something about the incoming signal, the quantity of information received is reduced. Maximum information transmission is possible, only if the symbols of the alphabet are equally probable and independent of each other.

For English text, the symbols are, by no means, equally probable and independent. There are far more Es than Zs, words have to be recognisable English words with particular spellings and arranged according to grammatical rules. Sequences that describe very improbable events are themselves very improbable. This greatly reduces the number of letter sequences that are possible. These constraints reduce the information capacity of written English from 4.7 bits per letter to around 1 bit per letter. In past times, when information transmission was more expensive than today, telegraph companies would have codes for commonly used phrases like ‘happy birthday’ or ‘best wishes for mothers’ day’. They could be transmitted more cheaply using a short code, because they conveyed very little information.

This inefficiency is described as redundancy – it becomes necessary to have a longer message than would be needed with a maximally efficient code. But redundancy is not always a bad thing:

It makes it possible to understand the message even in a very noisy transmission.

Without redundancy, our communications would have no defence against noise, but in a noise-free environment, we can code information so that it can be transmitted more efficiently. When we buy software, the installation process may involve converting it from a compressed format. This has been done to save space by sending the information in a form that has less redundancy, where the programmer can be confident that the transmission is reliable.

Noise is the part of the received data that is not carrying the required information to the receiver. It can also be measured numerically and communications engineers talk about the signal/noise ratio. If noise is high, it requires more redundancy in the transmitted message to overcome it. Of course, noise is relative to the receiver’s point of view. People at a party talking loudly in an unfamiliar foreign language might make it more difficult for me to hear the person I am talking with. Their signal is our noise, yet our signal would be theirs.

So much for information, but what about entropy? I am proud that I passed my undergraduate years in the very laboratories in Glasgow University where Lord Kelvin and others were making their discoveries about the Second Law of Thermodynamics. Of course, just being there did not guarantee that I absorbed the information – quantified or not! Yet Lord Kelvin is one those people like Churchill whose quoted words have survived to become part of our culture. Examples of his sayings are ‘the steam engine has given more to science than science has given to the steam engine’ and ‘when you cannot express it in numbers, your understanding of a subject is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the stage of science". These offer good advice to scientists, engineers and especially philosophers even today. He also made some mistakes, such as declaring both flying machines and evolution impossible. He was one of the few people to have a good reason to hold the latter view, because his calculations of the heat escaping from the earth implied that it must have been very hot a relatively short time ago. The problem was solved only after both Kelvin and Darwin had died, when it was realised that radioactivity was replenishing the earth’s internal heat.

But what has thermodynamics got to do with information? At first sight, we are dealing with the behaviour of gases, engines and fuels. Carnot in France, Kelvin in Britain and Clausius in Germany were all primarily concerned with the efficiency of steam or other piston engines and the behaviour of the gases that drive them.

Imagine a cylinder containing gas in a small volume under pressure. The piston can be driven out to do work, but as the crank goes round and the gas is recompressed the same amount of work must be done on the gas to get it back to its starting point – rather more, in fact, when friction losses are considered. To make a useful engine we need to use heat. If heat is applied to the compressed gas, its pressure will become even higher and the piston can be driven out with greater force. If the gas is now cooled, the pressure will fall and the return stroke can be done with less force. We have an engine that will deliver a network output over its cycle.

But we have had to give up some of the heat energy in cooling the gas. This is ‘waste’ heat. It, no doubt, cost us fuel to create and now it is degraded to a lower temperature where it cannot be turned into work.

The First Law of Thermodynamics says that energy can neither be destroyed nor created and heat and work are equivalent and can be measured in the same units. The Second Law says that while work can always be fully converted into heat, heat cannot so readily be converted into work. The task of the nineteenth century researchers was to understand and quantify this apparent law of experience. Their result was to discover that a quantity named "entropy" could increase, but for any self-contained system, it could never decrease. This, in itself, had considerable philosophical impact. Here was a proof that there is an arrow to time. The processes of the universe are irreversible in a much more fundamental sense than the difficulty of putting the toothpaste back in the tube.

Professor Hammond, last October, gave us an excellent introduction to entropy and he rightly said that, while we have little difficulty in visualising quantities of heat and of work, it is much less easy to have a feel for the quantity called "entropy". We can define the change in entropy as the amount of heat transferred divided by the temperature:

dS = dQ/T

Suppose we take a quantity of heat Qhot from an infinite hot source and use it to drive an engine. We need an infinite, or at least very large, source so that the temperature doesn’t change much as we draw heat out of it, to make our mathematics easier. The amount of work generated is W and the amount of heat liberated to an infinite cold sink is Qcold.

The First Law states that energy is conserved; the heat from the hot source must equal the sum of the work done and the waste heat released to the cold sink:

Qhot = W + Qcold

The Second Law states that the entropy added to the cold sink must be greater than or equal to the entropy taken away from the hot source:

Qcold/Tcold >= Qhot/Thot

If our engine is maximally efficient, the entropy does not increase, but just manages to stay the same and this relationship becomes an equality. The efficiency of our perfect engine is the ratio of the work output to the heat input:

Efficiency = W/Qhot = (Qhot – Qcold)/ Qhot = 1 – Tcold/Thot

This is an important and interesting result. It means that there is a fundamental limit to efficiency. No matter how carefully we reduce friction and polish the exhaust valves, we cannot do better than this. Suppose we could devise a machine to use the heat from the sun at 5000 Kelvin and use space at 3 Kelvin as the cold sink. The efficiency would be 0.9994. But if we have a more terrestrial engine using heat at 500 Kelvin and discharging at atmospheric temperature of 288 K, the maximum attainable efficiency would be 0.424. Real practical engines suffer from friction, which turns some of their work into heat and increases entropy, making the efficiency even lower.

You will be wondering what this could possibly have to do with the measures of information that we talked about earlier. A clue comes if we examine the entropy of a unit mass of a substance. If we add a small amount of heat dQ, the temperature will increase by dT so that dQ=CdT, where C is the specific heat. This small increment of heat will cause an increment of entropy of dS=dQ/T, or dS=(C/T)dT. Assuming that the specific heat is constant, integration gives us S=C log(T).

Ludwig Bolzmann who died in 1906, had the expression S = k log W carved on his tombstone. His contribution was to formulate the second law of thermodynamics in terms of the probable arrangements of atoms and their energies. In this context, W is the number of arrangements that would give the same observed state at the macroscopic scale. In effect W is proportional to a probability and Boltzman is saying that the world is tending to move to a more probable state. By deriving the second law mathematically he has arrived at a formulation similar to that later found for information transmission. So is the measure of information and that of entropy the same thing, as some would have us believe?

The answer is clearly not. Like all analogies, it is useful where it corresponds, but we must be careful not to deceive ourselves where it does not. For example, I have five plastic tiles, each bearing a single letter, here on the table. They spell the message "BRLSI" and I will now sweep them off the table. The message on the tiles is destroyed as they fall to the floor. They also convert their potential (height) energy into heat with an increase in entropy, but that is clearly not the same thing. Their thermodynamic result would have been the same even if I had mixed them to destroy the message before sweeping them off the table.

One of the big differences is that information can be replicated. The information carried by these tiles has not been destroyed because several copies of it still exist in my mind and yours. This is an essential difference to which we will return in a few moments.

Of course, the measure of the quantity of information that we have been discussing is useful, but it is not a full measure of its value to the receiver. That value would be the improvement, in terms of the receiver’s own objectives that could be achieved as a result of decisions made in the light of the new information. To know the position of 99% of the lions in Africa would be a very much greater quantity of information than that of just the one lion, concealed near the path where you are intending to walk. It is easy to understand which parcel of information might be most useful.

We, as evolved animals, have innate capacities to process information, so as to spot correlation and to make decisions that will promote our survival and reproduction. Often we do this rather well, but quite unconsciously of the mechanisms taking place in our heads. We also process information about information. Knowledge of the provenance of information is very important to establishing its credibility. A little analysis may be able to give us more understanding of what we are doing. There are a number of different concepts here, which require some effort to disentangle, but it is well worth while to try to clarify them.

Information only exists when it is carried by physical entities of some kind that give a coded representation of a source message from another person or ultimately some aspect of the real world or our own internal world. These entities may be electrochemical impulses in a nerve cell, sound waves, electrical pulses in a wire, letters on a page, flag movements or whatever. A code is obviously necessary for information transmitted between people, but it is also true about the world that we suppose we directly observe. In fact we do not observe it directly. The three-dimensional world that we see out there is coming to us on rays of light that have bounced off its features and formed a two-dimensional, upside-down image on our retinas; it is then converted into electrochemical nerve impulses and sent to different areas of the brain that identify straight lines, movement and other characteristics and then bring them together somehow to give an understanding of a 3D physical reality. Similar physical events lie behind our other senses. Direct observation of the real world is both noisy and full of redundancy. We are good at using the redundancy to overcome the noise.

We are almost unconscious of the astonishing amount of code conversion that is happening in normal life. Suppose a friend tells me a telephone number and I email it to someone. The number is stored in some kind of code in my friend’s long-term memory. It must be turned into a sequence of nerve impulses controlling my friend’s mouth shape and vocal chords to make it appear as sound waves in spoken English. It is then converted into movements in my ears and then coded as nerve impulses creating a record in my short-term memory, probably in a different code. To write it down, a complex series of codes are emitted by my brain to do the necessary hand movements with visual feedback. I then forget it, but re-install it from my note by looking at it, causing images on my retina that must be identified as shapes and then recognised as letters and numbers, then processed as language and stored on my short-term memory. More code conversion is needed before I type it into the keyboard of my computer where mechanical strokes are converted to electrical impulses and then into a binary code, recorded by a magnetic medium and transmitted as pulses in the telephone line. Lots more decoding and recording will occur before the receiver of my email will have used the phone number to make a call.

Our whole feeling of existence, our consciousness, is the not-yet-fully-understood effect of these myriads of nerve impulses. We could paraphrase Descartes to say ‘I process electrochemical impulses in nerve fibres, therefore I am’. Neuroscience has only begun to scratch the surface and we can wonder whether it will ever be possible to understand the whole of the wonderful complexity of the human brain with no more than a human brain to investigate it. But certainly it is worth trying – let us not join those who have made the mistake of declaring something impossible (like Lord Kelvin about flying machines). Although much remains to be explored, a great deal is already known about how the brain works, and valuable insights into its mechanism have been gained by observing patients who have suffered accidents or disease to the brain. It is astonishing that, even with the amount of evidence available today, there are still some desire-driven people who hope that the mind is something more than this – something supernatural. We should not waste our time with them, because a theory that is not sourced in observed information has no value.

When the information coming into our senses contains redundancy, it is possible for us to reduce it to a more compact form. That is what is known as forming a theory. For example, we can see and record the movement of the planets with their mysterious wanderings and retrograde movements in the night sky. It seems like a lot of data, yet it can be compressed. When Kepler observed that, in three dimensions, their movements are almost, but not quite, ellipses, he achieved considerable data compression. But when Newton proposed that each body is attracted to another by a force proportional to their masses and inversely proportional to the square of the distance between them, and found that their paths fitted this as perfectly as observation could tell, it was a great step forward. We may not know what causes this force, but it is still a massive removal of redundancy. That is what we mean by an elegant and successful theory identifying a natural law.

A theory has predictive power. Why is this? Going back to our data stream of letters, let us suppose that we receive the sequence ‘BZUQFAIUGBEWLAJ’, can you guess what the next letter is going to be? Perhaps not, I certainly can’t, but if we receive the stream ‘AAAAAAAAAAAAAA’ then it is not too hard to predict that the next letter is probably ‘A’. And because we had a good idea that it would be ‘A’, the confirmation that it is, in fact, ‘A’ gives us very little information. This is a highly redundant data stream. The discovery of a theory is thus the discovery of redundancy in observed data and the reduction of its information into a smaller amount of data. Its predictive power is no more, in principle, than supposing another "A" is likely after a long stream of "As". Having discovered a deductive method of reducing a complex stream of data to something simpler that seems to occur every time, we have what we call a natural law.

It should be said that the predictive power of a theory refers not only to the future, but also to any observations that we have not yet made. Evolutionary and geological theories, for example, give us information about what has happened in the past, although it sounds a little odd to speak of predicting the past. It follows also, from this view of theory-making, that the theory can be no more than the evidence on which it is built and it is useless to build elements of it that are neither confirmed nor disproven by the underlying observations. Here we are approaching a restatement of Occam’s Razor!

Part of the predictive power of a theory lies in the concept of cause. A friend of mine, an amateur meteorologist, discovered an important natural law about wind. He noticed, after many observations, that every time he saw the trees moving their leaves, it was windy. It was clear to him that the wind was caused by the trees waving their leaves and pushing the air along. Wrong of course, but it is not clear why. Only when we gather more data, or perhaps use data we already know, can we prove that. The direction of causation is often not quite so clear. Suppose we discover that consumption of a certain vitamin is correlated with symptoms of clinical depression. Does this mean that the vitamin causes depression, or does it mean that depressed people tend to consume more vitamins? The direction of causation is not, however, something that complicates our idea of theory making. It is just another detail of reducing a complex mass of data to find a sequence that seems likely to repeat.

It is not just in the pinnacles of science, but in ordinary life too, we are forming theories. By spotting the patterns in the incoming data we achieve that predictive power. We have absorbed information about how people move in general and we observe how a particular person moves on the pavement towards us. We are usually able to predict that person’s movements enough to avoid a collision. A very large part of our mental daily life is the forming and use of these small theories. Whatever the level of importance, predicting the results of alternative decisions we might make, then evaluating these decisions against a value scale and choosing the decision that gives the best result, we have the essence of how our mental apparatus works.

To use any information, we have to know the code. A code has much in common with a theory. The ability to decode the evidence from the ‘directly’ observed real world requires a theory of what the observations tell us. In principle the interpretation of streams of impulses in thousands of nerve fibres could be studied to produce a theory of the three-dimensional world that we all believe in. This theory or code has probably been largely installed in us by evolution acting on our ancestors’ genes, although some parts of it may be learned through a capacity for learning that is, in turn, coded in our genes. Obviously a capacity for learning requires a complex mechanism before it can begin. We are mostly unconscious of that mechanism.

Languages are learned by our brains correlating the spoken words with the decoded images coming from direct observation. We do not have any particular language genetically programmed within us, but we certainly have apparatus in our brains specifically adapted to learn and use language. Experiments with even our close relatives, the chimpanzees, show that, although they can acquire some rudimentary language ability, they lack the brain hardware to progress very far. A second language is usually learned by correlating it with the first language. When we become more fluent, we no longer code our thought into English and then recode in the second language; instead we go directly from thought to word. When we eavesdrop on a conversation in a language that has no common features with one that we have learned, it sounds like random noise – we can extract no meaning from it.

To clarify these ideas, let us try a thought experiment. Let us suppose that we are members of a team of data-processing experts in the future and we are charged with doing the job of a human brain. And suppose that we are equipped with computers whose power is far greater than today’s, but we are not allowed to assume any information other than the idea of theory making just discussed. All other information must come from the incoming nerve impulses. These impulses are called "action potentials" and involve an electro-chemical process that travels along the "axons", or main nerve fibres, of the nerve cells. Normally the axons terminate on the ‘dendrites’ or secondary fibres of other cells or on the cell bodies themselves. In the real brain, these receiving cells then process the incoming signal in some way and decide whether to fire an impulse themselves, sending information to further cells.

For our experiment, we are going to remove the brain and take its place. We have incoming fibres giving us signals, but we have no knowledge of any code, or even which sense organ they come from. Remember, we cannot ‘see’ what is going on outside; we don’t even know that there is an outside. We have only the nerve impulses. If this is all we have, what can we do? We can find out whether there are any correlations within the incoming data itself. The data stream from one fibre may provide some clue as to what pattern to expect in another fibre at the same time, and perhaps a clue what to expect at a later time. The brain has outputs which we can call decisions. By sending certain sequences of impulses down these outgoing fibres, we find that the incoming data stream alters in a predictable way, at least in part.

Assuming that we are data processing experts of extraordinary skill, we may well have formed theories that allow us to predict sequences of incoming data and to learn how to alter them with our decision outputs. But we are still a long way from being a human brain. If the blood-sugar level of our body drops, for example, that will only cause a change in the pattern of incoming impulses. It is possible that we may have been clever enough to discover that a complex pattern of output signals can cause a search for certain circumstances that will result in input signals that can be compared with a database and found to be in a certain category. Perhaps this category has been given the quite arbitrary name "food". Further output signals will carry out the complex control task of picking it up and eating it, causing blood sugar levels to rise again.

We might do that once or twice out of intellectual curiosity, but why continue? After all, there are many other correlations between output and input to explore. What our team lacks is any information on what we are trying to achieve. There is no information in our hands to say what our goals are (and we are not allowed to smuggle our own goals in). We have been discovering correlations allowing us to control our incoming signals to some extent, using our decision output signals. But we have no information to specify that one set of incoming signals is preferable to another. Where can we get this information? Clearly not from the incoming signals themselves, because these are what we are trying to compare. To work as a substitute brain, our team has something missing that it cannot create. The objective, the goal, in terms of detectable inputs, would have to have been provided in advance.

Now this is not such a surprising result. In fact we have just re-discovered Hume’s law that you cannot derive value information from fact information, you cannot derive an ‘ought’ from an ‘is’. Or, to put it another way, fact information is about how the world is, value information is about how we would like it to be, and these are two different things. They cannot be derived one from the other.

From this it follows that, to work as a decision-making device, a brain must have some starting set of values or preferences programmed into it genetically. It cannot get that from its nerve-signal inputs and we can see why the widespread belief in the blank slate or "tabula rasa" was complete nonsense (in fact usually nonsense motivated by political desire). So we can see that without some genetically specified values, a brain simply cannot begin to function. This clarification of Hume’s Law is a fact with great relevance to philosophy. Values cannot be deduced from facts alone – that is the naturalistic fallacy. Equally, of course, facts can not be deduced from values. Those who say, "That would be too unacceptable, we cannot possibly believe that!" are also wrong.

Although value information transmitted by inheritance must be present in all brains, human and animal, it is likely that a large amount of fact information is too. Although a great deal of learning does take place, particularly in the early years, it would be surprising if some of the theories that we might make were not hard-wired genetically also. What a massive task it would be to crack the code of incoming nerve impulses to produce a theory of a three-dimensional world of solid objects and all the other concepts of the "real world" that we take for granted. Our theory of other people’s minds is almost certainly a built-in mechanism, which can be seen to go wrong in autism. It seems more likely that our evolution will have programmed some fact theories into our brains, but it is absolutely necessary that it has programmed rules of logic and inference together with a basic value function.

For example, although we can consider four-dimensional space, and deal easily with it in mathematical terms, we do not really believe in it. Yet the neural circuitry could probably be constructed to handle it just as easily as it does for three dimensions. But it has not evolved to do so, probably because the world really is three-dimensional.

William Paley, in his Natural Theology of 1802 gave his classic argument for the existence of God: "In crossing a heath, suppose I pitched my foot against a stone, and were asked how the stone came to be there, I might possibly answer, that … it had lain there forever … but suppose I had found a watch upon the ground, and it should be inquired how the watch happened to be in that place, I should hardly think of the answer which I had before given, that, for anything I knew, the watch might have always been there. Yet why should not this answer serve for the watch as well as for the stone? For this reason, that, when we come to inspect the watch, we perceive (what we could not discover in the stone) that its several parts are framed and put together for a purpose. …the inference we think is inevitable, that the watch must have had a maker – that there must have existed, at some time, and at some place or other, an artificer or artificers who formed it for the purpose which we find it actually to answer; who comprehended its construction, and designed its use."

Paley did not exactly succeed in proving God’s existence, but we must agree that he had a powerful point. The evident non-randomness of the watch makes it something different from the stone – something that needs explanation just as he claims. Living things show the same evidence of a property that seems to imply purpose. We need a precise understanding of what that is.

What we call order, information, or non-randomness is often used loosely as interchangeable terms and I must confess to doing it sometimes myself, despite my aim to stick to what is mathematically definable. Order and information are, in fact, very different concepts. For example, the sequence ‘ABABABABABABAB…’ shows great order, but carries little information. If we are told that the next letter turns out to be ‘A’ and the next ‘B’, we would not be surprised; our probability of that event would not have changed much. It is not difficult in this case to form the theory that the alternation of ‘A’ and ‘B’ is a property of the sequence and a more succinct descriptor of it. This is the theory-building process that we have discussed. A data stream has the property of non-randomness, when there is identifiable redundancy so that theories can be made. It would never be possible to build a theory or crack a code, if every bit of data available to us gave a brand new piece of information unrelated to anything else.

When we think of the problem of information and order in living things that Paley has posed. It is clear that the study of biology gives us a great deal of information. We have discovered things that our untutored brains could never have dreamt of. Evolution is an extraordinary machine that has generated vast amounts of information. Yet the data that comes from biology also contains a great deal of order in the sense of the word defined above.

In particular we find, as we explore many different biological phenomena from the body chemistry of dung beetles to human marriage customs, from the migration behaviour of birds to the action of organelles within the cell that control the transfer of information from DNA to RNA and to proteins, we observe that they all seem to be serving a similar apparent purpose. That purpose is to give properties that William Paley would have called "useful". But we can now more accurately describe it as the optimisation of features that will tend to increase the number of the organism’s descendants. The reason why Paley’s idea of usefulness was aligned with biological fitness is no coincidence because, like the rest of us, Paley was a product of natural selection!

Out of the many, many possible arrangements of atoms that might exist, this is a very small subset and it is a quite distinct signature of living things. Biological theory has distilled it into a succinct message, which is not far from a simple statement that natural selection exists and has acted. The almost infinite complexity of the structure and behaviour of living things has been generated by this information source. Of course we find some features of living things that are not yet optimised by natural selection, but these things (such as sickle cell anaemia and addiction to drugs) exist only where natural selection has not had enough time to act. They seem to be the noise, not the signal.

Creationists are people who regret, as I do myself, the damage that scientific evidence has done to religion. But instead of bracing themselves to accept the truth, whatever it may be, they have made denial into an art form. They ignore the great bulk of scientific evidence, but cling gratefully to any scrap of science that looks like supporting their desires.

Creationists often claim that evolution contradicts the second law of thermodynamics, because it brings order out of chaos; it creates non-randomness. In its direct application to the distribution of heat energy, this is clearly false. The sun has supplied enough high-grade energy throughout evolutionary history. But, in the analogy between information and entropy, the idea might make some sense. Just as the entropy of the universe, or any closed part of it, can only increase, information flowing in a closed channel can only degrade. If the signal can not be added to, it will ultimately decay into noise; the universe will ultimately collapse into chaos and improbable arrangements of atoms will be seldom found.

But the second law of thermodynamics is not absolute. As Boltzmann showed, it is a statement of probability, but with such a large number of atoms in the universe, the statistical average inevitably prevails. Yet, in the 19th century, James Clerk Maxwell presented a counterexample. He imagined that two chambers of gas at the same temperature might be connected by a small aperture. At the aperture sits a tiny demon with a table-tennis bat. As molecules approach from chamber A, he holds it up against all of the fast moving molecules so that they bounce back, but allows the slow ones to pass. Molecules approaching from chamber B are bounced back if they are slow, but allowed to pass into chamber A if they are fast. In time the gas in chamber A will be at a higher temperature than that in chamber B without any external source of energy and the second law will be broken. The demon has added no energy – only order. We confidently use the second law today because Maxwell’s demon does not exist. But if some unlikely future nano-technology could ever produce a machine to work as a Maxwell’s demon, we would have a solution to the world’s energy problems at a stroke! (The world of course has no energy problem, because, by the First Law, it can neither be created nor destroyed. What it has is an entropy problem.)

But when we come to the ordered content of living things, the situation is different. The analogue of Maxwell’s demon is alive and well and it is called natural selection. This force has been present for as long as replicators have existed, steadily producing non-randomness by selecting some and discarding others. Evolution does not reduce thermodynamic entropy, but it does reduce random disorder because these are not the same thing. We have passed outside the areas of correspondence in the analogy. This is why the creationist appeal to the second law is as false in its information-theory analogy as it is in thermodynamics. The process of natural selection really can create order out of chaos and can generate an enormous quantity of information.

All living things are highly ordered or non-random. They contain information. Our minds are the function of our nerve cells; our nerve cells have the same source as our muscle cells, bone cells and blood cells. They are all the products of evolution. All of these have been formed by many random events that have then been sifted by natural selection. Natural selection is the only source of information that defines our beings. We are made of very ordinary atoms and many of these atoms are not even the same ones that were in our bodies ten years ago. But the essence of a person is the information content embodied in the arrangement of the atoms. This information has come from natural selection combined with the random contingencies of our individual and ancestral histories.

It is common today to stress that evolution works to no purpose. This is certainly true in the sense that it works to no purpose outside itself, but it works to a purpose of its own – it is the blind watchmaker. It is the only source of directed behaviour, other than accidental contingency. It is the only source of purpose that has been discovered in the universe. This is not a conclusion that appeals to human vanity, yet that vanity too is a product of evolution.

But against this picture we have the very large body of information with which we are most familiar – that of human culture. Over the last few centuries this fund of information has been causing much more dramatic changes than the slow working of natural selection. A new improvement in technology, or even a new false belief, can spread through the population at a far faster speed than it could ever do by inheritance. It is not only fact information, correct or otherwise, that can spread – values can too. We all have an instinct to adopt some values from others in our communities and a fear of breaching the accepted taboo. When we look at the change in many socially accepted values over the last hundred years it has been extraordinary. In 1900 it was perfectly acceptable to refuse entry to a restaurant or bus because of a person’s race, yet it was a serious scandal to confess oneself an atheist and homosexuals were sent to prison. If it were found out that two unmarried people of opposite sex had even spent a night unaccompanied in the same house, quite serious condemnation could result.

This very substantial effect that culture has on our values seems to rule out my earlier assertion that their origin must ultimately be due to information-generating property of natural selection. Perhaps we can throw some light on this by examining a model of information flow in the development of culture. Clearly people are acting as transmitters and receivers of information and there are various channels of communication. These include the spoken word, the written word, even facial expression can pass information. Nowadays, the transmission of information is greatly changed by television, Internet and email. Yet it must be beyond dispute that the information cannot originate in the communication channels. It must come from the transmitters, so the conclusion happily accepted by many, that our values are an emergent property of our cultural development, must be wrong. ("Emergent" is, in any case, a word that we should treat with great suspicion. It is not an explanation, but rather a pious hope that coherent information can come together out of a random jumble somehow. This cannot be true – information theory tells us that information passing in a channel can degrade, but it will not increase in quantity.)

But there is a complex effect happening here. The value information that the transmitters put out is conditional on what they receive. The expansion in the means of transmission changes the way in which the instinctive values are expressed. In simple terms, the existence of instant, long-distance communication makes distant people seem more like members of our home community. Much of what passes for moral rules is actually the result of a multi-person, negotiated cooperation. This cooperative structure can grow at a much faster speed than could inherited information, but it is nevertheless the product of people who are serving their inherited drives. We support social rules because it is good for us to do so.

In 1978 Richard Dawkins launched the metaphor of the meme. A meme is an item of human culture – information that is passed between one brain and another. It can be of any length from a new word to an entire religion. It may concern a factual belief or an expression of values and it may be right or wrong. Information packages with all these qualities inhabit an environment of human brains. When one person communicates to another, they reproduce, some more successfully than others. When they are forgotten, or the brain carrying them dies, then one of their copies dies.

The memes resemble living things by surviving and reproducing to different extents. Those that catch on will replicate more; others will die out. It is clear that natural selection will operate. Because the speed of mutation and reproduction of the memes is so much greater than the genes, memetic evolution seems to be much more rapid. In recent times, the speed of cultural evolution has been so great that some commentators have said (wrongly) that genetic evolution has reached an end point and that memetic evolution has taken over. Many accept the idea that our identity must be considered as a blend of the genes and memes that inhabit us, but that is a rather sterile model, which prohibits further analysis until the blending process is better understood.

But having shown that information, including value information, can be created by natural selection acting on genes, must it not be true that the same kind of information could be created by the natural selection of memes? In fact, the analogy between memes and genes is not complete enough to allow this to happen. Memes may evolve in the same way that plants or fungi evolve, but values only evolve as part of a replicating information processor. Although memes are information, they do not process it themselves. The only information processors on the scene are the human brains that provide their environment.

Nevertheless memes and genetic people are species that co-evolve, adapting to each other. Memes will adapt so that they replicate best in the population of human brains that they find around them and the genes will adapt to replicate best in the environment of memes that they find themselves in. Neither will be evolving for the benefit of the other, however, different species never do, but each will try to exploit the other for its own benefit.

My own metaphor is to regard people as being farmers of memes. Most of the memes that our genetic selves cultivate are of benefit to us and we seek them from others, just as a farmer needs to obtain seeds. Their evolution in the community of our brains is perhaps more akin to artificial selection than natural selection. Even the mutations of the memes are not always random, but are sometimes made to a purpose defined by genetic evolution coded within the brain – a process less like random mutation that might be called ‘memetic engineering’.

But not everything in our meme garden is lovely. There are memes that do not serve our genetic purpose – the analogues of weeds or even viruses. Since memetic evolution is so much faster than genetic evolution, perhaps there is a danger that it will outrun its genetic hosts and take over? Certainly some infectious memes, like drug taking, seem to work against the interest of the genes. Maybe we will be able to domesticate some other memes to help us control the memes, as a sheepdog controls the farmer’s sheep. Perhaps philosophy, still a semi-wild animal today, could be tamed to do this job for us?

My talk this evening has touched on a number of subjects, and I hope I have convinced you of the relevance of them to philosophy. The differing concepts of information and order; redundancy and noise, natural selection as a creator of information; Maxwell’s demon which might have done so, but does not exist; the communication network view of human culture, the idea that tracing information to its source is essential to its verification; the natural selection of memes; the human project of meme farming and the human misfortune of meme infection. Much work remains to be done, but philosophy is about information, I am sure that these concepts must be useful to its future progress.

Donald Cameron

A Quick Summary

1. Information can be measured by using the logarithm of the probability of the received message; its quantity is sometimes called "entropy". "Redundancy" occurs when received signals are not independent Redundancy can be good, as it gives a defence against noise.

2. Entropy in thermodynamics and has the units energy/temperature. The Second Law of Thermodynamics states that the entropy of the universe, or any closed part of it, can only increase. It defines an upper limit to the efficiency of heat engines.

3. Boltzmann’s epitaph, S=k log W commemorates his demonstration that the second law is equivalent to saying that the world will move towards a more probable state. That is the reason for the use of the word entropy in both fields.

4. The relation between information and entropy is an analogy and like all analogies it must not be stretched too far. They are not identical. In particular, information can be created and replicated. The value of information can be very different from its quantity.

5. We have evolved an extraordinary capacity for coding information and tracing its provenance. The formation of a theory is equivalent to the identification of redundancy in the information stream available to us.

6. By considering the source of information, we can deduce that the brain must have a description of basic values and basic rules of logic installed genetically. There are probably some theories about fact (such as the world being three-dimensional) that are genetically installed also. Natural selection is the source of this information. (Compare Paley’s example of the watch on the ground.) Natural Selection is like a successful Maxwell’s Demon.

7. The Creationist appeal to the Second Law is as incorrect in its information-theory analogy as it is in its thermodynamic meaning.

8. Neither core values nor basic logic can be an "emergent" property of culture. Information originates in the transmitters, not the communication channels. But an interesting complication arises from the idea that elements of culture called memes can spread and evolve like living things inhabiting an environment of human brains.

Further Reading:

Peter W Atkins, The Second Law (Scientific American, 1984)

Michael Ruse, Taking Darwin Seriously (Blackwell, 1986)

Daniel Dennet Darwin’s Dangerous Idea (Simon & Schuster,1995)

Matt Ridley, The Origins of Virtue (Viking, 1996)

Alan Lightman, Great Ideas in Physics (McGraw-Hill, 2000)

Donald Cameron, The Purpose of Life (Woodhill, 2001)

Matt Ridley, Nature via Nurture (Fourth Estate, 2003)

Donald Cameron