Csaba Szepesvari (1996) Comparing Yardsticks for Cognition. Psycoloquy: 7(26) Optimal Cognition (4)

Commentary on Worden on Optimal-Cognition

Bolyai Institute of Mathematics

Jozsef Attila University

Szeged, 6720 Aradi vrt tere 1.

and

Department of Adaptive Systems,

Institute of Isotopes

Hungarian Academy of Sciences and

Jozsef Attila University

Budapest 1525, Pf. 77. HUNGARY

szepes@math.u-szeged.hu

Worden (1996) has suggested that cognitive science could be built around a model for optimal cognition. He has also proposed an equation, called the Requirement Equation (RE), which should describe the biological requirement brains must meet. Here I analyse the limitations of his equation in detail. The analysis is based on a more general class of models, namely, models of optimal sequential decisions, of which Worden's equation is a special case. It turns out that the RE can describe optimal behaviour if there is no perceptual aliasing. If the RE is able to capture all aspects of cognition then animal cognition is probably modular and thus it is more likely to be suboptimal than optimal. I also show that -- contrary to Worden's suggestion -- the optimisation problem faced by evolution may have many local minima, depending on the genetic encoding. Nevertheless, one can show, purely on the basis of the theory of optimal sequential decisions, that brains of animals probably use internal representations and that cognition has a universal limit when one considers its biological function.

1. Worden (1996) has proposed a model that could be used to measure the performance of brains. He argues that the model is capable of describing all aspects of cognition. Nonetheless, only two rather simple examples are presented and it is left to the reader's intuition why and how the model could be extended to more complicated cases. The model is embedded into a simple probabilistic framework and prescribes optimal behaviour as the maximisation of the immediate expected reward (MIER). For later reference I abbreviate this model as the MIER model.

2. Worden's first claim suggests that cognition has a species dependent limit. This claim has two parts: the first part says that there is a best possible brain for any species. This is not necessarily the case if the values that enter the model are unbounded or if there are infinitely many actions. The second part is weaker: it says that the performance of a brain for a certain species is limited. This just requires that the values be bounded. I think that the boundedness of values is acceptable, but not the finiteness of actions. Thus, one can not expect that optimal performance will even be achieved for this model; nonetheless, optimal performance may still exist. Moreover, I do not feel that the existence of a limit for cognition would be "striking" (see paragraph 14 of the target article) provided that fitness, by which the performance of cognition is measured, has an upper bound. Even strengthening this to say that there is a species independent limit of cognition, would not be surprising provided that the values of outcomes are uniformly bounded. We will see later that this assumption, when transferred back to an evolutionary framework, results in the requirements that the lifetime of animals as well as the fitness values of situations should have a species independent limit.

3. A more intricate question is how cognition is related to the evolution of species. Some authors have argued that ideas themselves can be the subject of a meta-level evolutionary process (e.g., see Dawkins, 1976), and this process reacts on the evolution of brains and thus on cognition. This means that brains are not necessarily the mere means of the evolution of a species. Let us now disregard this point (this simplification may be reasonable for low-level forms of cognition).

4. What aspects of cognition can be captured by the MIER model? Is it not too confining to restrict our attention to these aspects? Are we not loosing important details in doing so? We will consider a model, borrowed from operations research, that concerns the behaviour of an agent (animal) who faces a dynamical decision problem. We hope that, based on this model, it becomes possible to interpret phrases like "situated agent", "internal representation", "symbolic cognition", etc., and we can thus characterise some cases when the MIER model is sufficient as well as others when it is insufficient.

5. Assume that the possible states of the World are collected in some set, W. Assume further that any element of W (I will denote such an element by w) can be decomposed as a triple (e,x,a): this decomposition represents the agent's point of view in as much as the agent's states (x) and actions (a) are separated from the states of the environment (e). For simplicity, time is assumed to be discrete (note that it is not immediate that by assuming time to be discrete we do not lose generality) and at every time instant the agent perceives the state of the world through some limited capability transducers: y=P(w). Here y represents the perception of the agent.

6. The world's state evolves according to the equation

(1) w_{t+1}= E(w_{t}),

and more specifically, I assume that this can be decomposed as

(2) x_{t+1}= L( x_{t}, y_{t}), (3) a_{t+1}= D( x_{t}, y_{t}),

where

(4) y_{t}= P( w_{t}).

7. Equations (2) and (3) tell us that the dependence of the evolution of the agent's state and action on the state of the rest of the world goes through the perception function P. The above equations are commonly used in control theory and control theorists would say that an agent whose decision process is described by equations (2) and (3) realises "dynamic state feedback" as opposed to "static state feedback" when equations (2) and (3) were replaced by

(2)' x_{t+1}= x_{t}, (3)' a_{t+1}= D( y_{t}).

8. As mentioned above, variable x describes the "state" of the agent, that is, it represents the agent's "memory traces" or "internal representation" and in this sense equations (2) and (3) describe "internal representation" based control, while equation (2)' means that the agent does not have memory (the experiences of the agent do not have long term effects), and equation (3)' describes memoryless, or -- according to Brooks -- reactive control. If "fitness" values are associated with the states of the world (such a value would represent the fitness of the agent given the state of the world) and we accept some fitness maximisation criterion (such as maximisation of the average one-step fitness or maximisation of the cumulated fitness) then the model becomes an optimisation problem and several questions arise: Do we need an internal representation (equations (2) and (3)) to realise (near) optimal control? How do we choose the decision function D? How do we choose the learning function L and the set of possible internal representations provided that we decided to use internal representations? Due to the discreteness of time we have to put limits on what can be computed in one step. Are there any functions that satisfy these limits and result in near optimal behaviour?

9. The problem becomes exceptionally demanding if perception is limited; the evaluation function E is not known, or is just partially known, or it is very complicated and the solution must meet limited computational capabilities, etc. Researchers of operations research, artificial intelligence, reinforcement learning, optimal control, adaptive control, non-linear control, etc., have been studying these questions for decades. Some of these questions were answered in particular cases, but none of them has been completely solved yet.

10. To provide an example of how questions of cognitive science may be embedded into this framework let us consider for example the debate on "symbolic" vs. "non-symbolic" knowledge representations. It is really hard to differentiate between these two types of representations, but at a rough scale one could qualify a knowledge representation scheme as symbolic if it uses some prespecified knowledge organisation principle. The greater the role of the organisation of memory is the more symbolic is the approach. Organisation principles must be realised in terms of the functions L and D: any extra structuring of the memory demands computational power. Taking into account that the available computational power is limited yields upper bounds for the complexity of structured knowledge representations. On the other hand, structured knowledge means compressed representation and thus it can be useful if memory capacity is limited and may also decrease reaction time. Thus the debate about the use of "symbolic" vs. "non-symbolic" knowledge, when embedded into this framework, reduces to the question of memory and computational efficiency.

11. How is Worden's model related to the above model? Did Worden prove that brains use internal representation and sometimes symbols? Is it possible to prove this at all? Is it possible to calculate how close animal brains are to the optimal performance? The rest of the article tries to find the answers to these questions.

12. Evolution has three potentials to increase the expected fitness of any species: increase the action efficiency, increase the sensation efficiency, or introduce more complex decision strategies. All of these have natural physical limits. Thus without an advanced decision strategy the fitness of animals would be very low. Specifically, we will focus here on the role of memory in cognition. It is known that even in moderately complex decision problems the performance of agents without memory can not pass a certain limit that is easily passed by agents that make use of memory (Drake, 1962; Astrom, 1965; Littman, 1994). The more complex the agent-environment interaction is, the larger the gap can be between the expected fitness of agents with and without memory. In particular, if the agent-environment interaction is simple then memory will not likely help. This can justify the "situated action" approach in some cases. Interestingly, it is only the limited perception capabilities (also known as perceptual aliasing) that make agents with memory more efficient than simple reactive agents. This also means that in theory the only way to efficiently use memory is to collect information about the world that can help the agent to estimate more precisely the actual state of the world. Thus at a first glance the only role of memory is to compensate for the deficiencies of perception! The situation changes substantially if we take into account the limited computational resources of agents.

13. Consider for example an ideal agent with perfect state perception. Although, the optimal strategy of this agent is memoryless this strategy can be too complex to be exactly represented in the "genes" of the agent. It is possible that some memory based strategies have equally good performance and provide more compact representation. Another common rationale that supports the utilisation of memory is that it provides greater flexibility for the agent in as much as memory helps it to adapt to particular situations (different initial states, e.g. as a consequence of physical defects). These explanations represent the viewpoint that resources of a memoryless biological agent are unlikely sufficient to represent good strategies. In particular, the limits of available resources seem to be so grave that we all feel that more flexibility, that is, more memory would indisputably result in an increased performance. However, increasing flexibility has its pitfalls when it comes with a decrease in the amount of inherited knowledge: such flexible agents are severely injurable until they learn enough specific features of the world. Thus evolution would not support such a choice unless it goes with another change such as the appearance of nursing behaviour. This example shows clearly that evolution can not be modelled on the level of individuals: parallel changes in the behaviour of other agents produce a "different" world that means a changed optimality criterion. Such effects of simultaneous changes are usually investigated in the framework of dynamic games (e.g., see Shapley, 1953; and Nowak, 1989).

14. The first difference between the two models is the emergence of probabilistic features in the MIER model. However, these are not important from our point of view as these are just intended to model our uncertain knowledge of the perception function P and the function E. The main difference between the two models is that the MIER model suggests the form of an "optimal" strategy (it determines exactly how to compute values of actions and that the action with the highest value represents the optimal choice -- i.e. it prescribes the use of myopic or greedy strategies), while in the above model we start from an optimisation criterion and the form of optimal cognition is left unspecified. What criteria and decision model (E,P) results in optimal strategies that are myopic as well? The trick is that the values used in Worden's model must be chosen in a clever way (in general they should not be equal to the fitness of situations!) and then myopic strategies with respect to these values can be optimal for a wide family of decision problems.

15. For simplicity, let us adopt the optimality criterion that the animal should maximise its cumulated fitness. To ensure boundedness we assume that the "lifetime" of the agent is finite independent of the decision strategy used by the agent, that is, the decision model is such that the time during which positive fitness values may income is bounded (fitness is assumed to be non-negative). Then it is possible to define new values for states (which will now represent optimal cumulated fitness) such that at every stage it is optimal to choose the action that results in the state w whose (new) value is maximal. This is what the celebrated Principle of Optimality, discovered by Bellman, tells us (Bellman, 1957). If there were no perceptual aliasing, for example if sensation were perfect (P were injective) then the MIER model, when outcomes are identified with world states and their values are defined as the optimal cumulated fitness, is able to grasp the essence of optimality. On the other hand, if sensation were imperfect then optimal values may become inaccessible for the agent and optimal behaviour can no longer be captured as the maximisation of some well-defined immediate reward (Drake, 1962; Astrom, 1965; Littman, 1994).

16. Thus we see that the MIER model is limited in some sense: without the use of optimal cumulated fitness values only static aspects of cognition can be captured and even if optimal values were used then "perceptual aliasing" should be excluded. The computation of optimal values requires knowledge of the whole decision problem and perceptual aliasing usually can not be precluded. Thus it seems to be futuristic to measure the performance of animal brains based on these mathematical models. There is still no better way to compare an artificial mechanism with animal cognition than to examine them in similar situations (Turing's test). Of course, when we focus on certain restricted aspects of cognition, then simplified models become available provided that we developed some insight into those aspects a priori.

COGNITION

17. The two examples presented by Worden are illusory: in both cases "optimal" behaviour is trivially embedded into the values used in the model and unfortunately it remains unexplained how these values are related to the fitness of the animal. In the desert ant's example if the proposed ad-hoc values represented the fitness then this would mean that the ant's fitness is inversely proportional to the length of the path found by the ant back to its burrow and does not depend on anything else. In reality these values would depend on some other quantities of the world's state, such as the hunger and thirstiness of the ant just to mention the two simplest. Implicitly, Worden supposes that these dependencies can be neglected if one concentrates only on the navigation problem. Contrary to Worden's hypothesis the "Requirement Equation" (RE) is not an optimality criterion, but, as suggested in the previous section, it is also a special simplifying assumption concerning the cognition of animals. If individual aspects of cognition can be closely approximated by the solution of appropriately constructed REs separately, then this can be taken as an indicator that cognition is modular. Although, our macroscopic world shows certain modularity (e.g., the lumpy nature of matter), it is not clear if for the cumulated fitness maximisation criterion, it is possible to isolate subtasks such that the myopic solutions of these subtasks when combined appropriately result in an overall (near) optimal behaviour. Nonetheless, biology shows that reasonably good solutions may be achieved at least in our world.

18. Another problem with the navigation example is that it contains only those variables about which we would like to reason: the only data that enters and leaves the model are displacement vectors. This presupposes that ants are able to sense and have actions to follow displacement vectors. Little wonder that the solution of the RE is given in terms of displacement vectors! If other perception data and other operators, such as sense data and actions concerning landmarks, were also included in the model and we found that it is still optimal to navigate purely on the basis of displacement vectors then we could conclude with good reason that "if ants were able to use displacement vectors as their inputs and outputs, then their optimal behaviour requires the maintenance of an internal representation of their overall displacement". It is the biological experiment that justifies that ants maintain an internal representation of its own position relative to its burrow and, consequently, the RE can be used to model the navigation of ants and not the reverse! Nonetheless, the RE (or a more advanced continuous time version) could be used to predict how precise the internal representation and perception of the ant should be to reproduce the (independently measured) accuracy of their navigational capabilities. The problem with Worden's construction is that it uses ad-hoc data and the solution is in no way related to the navigation of ants at all but it tells us something about the nice behaviour of peaks of Gaussians. I felt that Worden wanted to model the navigation problem qualitatively only. However, one can not reason about qualitative properties without justifying the underlying quantitative assumptions unless one ends up with tautologies!

19. The same comments apply to the second example, that is, to associative conditioning. This example shows that animal brains have the flexibility to represent causal combinations of (almost) any pair of events. Here the RE suggests only that animal brains should be modular: causal relations are memorised by rules that are independent of the value of the participating events.

20. It is the interaction of subtasks that makes cognition of animals non-trivial: the first lesson we could learn from the story of artificial intelligence is that the solution of simple problems can not be easily turned into the solution of complex ones. Neural nets and reinforcement learning (RL) tackle hard problems, much harder then the decision tasks considered by Worden. In general, RL suggests on-line learning algorithms whose aim is to solve the subtask-interaction problem, that is, if the problem is given in the form of subtasks then how should the individual subtask solving behaviours be initiated to provide the best possible behaviour. If there is no temporal interference between the subtasks, some RL techniques (e.g., the ARTDP algorithm of Barto et al., Barto et al., 1991) reduce to Bayesian learning. Neural net research, on the other hand, is characterised by the idea that the collective behaviour of a large number of identical processing elements (neurons) can lead to a qualitatively new behaviour. The learning speed of both techniques depends very much on the problem they should solve. Since researchers in both fields want to solve the hardest learning problems, their results are often disappointing. However, this does not mean that these techniques (or other advanced problem solving techniques) are incapable of solving simpler problems! (A "toy" problem, such as a 2D navigation problem, can be very hard from the point of view of learning from tabula rasa. These toy problems are just used for demonstration purposes and no one thinks (I hope at least) that these problems should be solved by complex learning methods from tabula rasa!). We have seen in Section III that complex animals need learning mechanisms and thus both RL and neural nets may prove useful to model some cognitive functions. Another common misconception about both RL and neural nets is that since these are "universal" problem solving methods they never use domain knowledge. Contrarily, the architecture of a neural net and the building blocks of a RL algorithm do include domain knowledge and substantial improvement in performance can be achieved by fine tuning these (see, e.g. Lewis et al., 1995, for problem oriented neural network design and Singh, 1992, for scaling RL by a modular design).

21. If cognition were modular, as it is predicted by the successful use of RE and also the working mechanisms behind evolution, then it may well mean that evolution does not necessarily reach optimal solutions. Worden's "proof" that propounds that evolution is unlikely to get trapped in a local maxima relies on the fact that the optimal decision function is given as the linear combination of certain mysterious "basis functions" and the optimisation problem has only one maximum, since the underlying optimisation problem turns out to be of the simplest type, that is, it is quadratic! The first problem with this is that we can not hope that the basis of expansion would be finite. This is because the function space we want to explore is probably of infinite dimension (this holds if either the potential action, data, or state (memory) space is infinite). However, in finite time, evolution can only explore a finite basis if genes encode finite information. This means that evolution must find an appropriate finite basis among the elements of an infinite set such that using that basis the optimal form of cognition can be closely approximated.

22. On the other hand, although the optimisation problem is quadratic
in the parameter space, it is not necessarily quadratic in the space of
genes, where evolution works! To see this assume that a gene g encodes
the coefficient vector a(g). The task of evolution is to find the
optimal coefficient vector a*. Let the (average) fitness of an animal
whose decision function is given by the coefficient vector a be denoted
by f(a). By assumption f(a) is quadratic and thus has a unique
maximum. However, no matter how trivial the optimisation problem
A=<f(a)->max, a=?> is, the optimisation problem G=<f(a(g))->max, g=?>
can be arbitrarily complicated (it may even hold that max_{g}
f(a(g))<max_{a} f(a)). As an example consider the optimisation problems
A=<-a^{2}->max, a is real> together with the transformed optimisation
problem G=<- (gsin(g))^{2}->max, g is real>. These problems are related
by the transformation g->a(g)=g sin(g). The first problem (problem A)
has only one maximum while the second problem clearly has infinitely
many local maxima at the points, where sin(g)=0. The reverse statement
is illuminating: any optimisation problem can be transformed into a
quadratic optimisation problem by choosing a clever transformation that
maps the domain of the search problem into an appropriately chosen new
domain. This is just what happened in Worden's argument. It is very
doubtful if the mapping g->a(g) is "monotone", which could ensure that
the optimisation problem G still has no local maxima. As it is well
known from the theory of genetic algorithms, the efficiency of genetic
search relies on the details of genetic encoding (Rawlins, 1991).

23. Worden suggests that cognition could be understood from its most important biological function, that is, to ensure as high a fitness to the owner as is possible. Here I also adopted this view, but considered another optimisation criterion that turned out to include Worden's model as a special case. This model shows that the role of memory or internal representations is to compensate for perceptual deficiencies and for the limits of the computation power (speed and size limits). However, there is no hope that this model can be used to estimate the efficiency of natural or artificial brains. In fact, the success of Worden's much simpler model suggests that animal cognition is likely modular and is sub-optimal. It seems to be promising to identify different "modules" of cognition that satisfy Worden's Requirement Equation and then find the rules that govern the interaction of these modules.

[Astrom, 1965] K.J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 1:174-205, 1965.

[Barto et al., 1991] A.G. Barto, S.J. Bradtke and S.P. Singh. Real-time Learning and Control using Asynchronous Dynamic Programming. Computer Science Department, University of Massachusetts, Technical Report 59:91-57, 1991.

[Bellman, 1957] R. Bellman. Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957.

[Dawkins, 1976] R. Dawkins. The Selfish Gene, Oxford University Press, 1976.

[Drake, 1962] A.W. Drake. Observation of Markov Process Through a Noisy Channel, PhD thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1962.

[Kumar, 1985] P.R. Kumar. A survey of some results in stochastic adaptive control. SIAM J. Control and Optimization, 23:329-380, 1985.

[Lewis et al., 1995] F.L. Lewis, K. Liu and A. Yesildirek. Neural Net Robot Controller with Guaranteed Tracking Performance. IEEE Trans. on Neural Networks, 6:703-715, 1995.

[Littman 1994] M.L. Littman. Memoryless policies: Theoretical limitations and practical results. In "From Animals to Animats 3: Proc. of the Third International Conf. on Simulation of Adaptive Behaviour", eds D. Cliff, P. Husbands, J.A. Meyer, S.W. Wilson, Cambridge, Massachusetts, 1994, MIT Press.

[Nowak, 1989] A.S Nowak. Existence of optimal strategies in zero-sum nonstationary stochastic games with lack of information on both sides. SIAM J. Control and Optimization, 27:289-295, 1989.

[Rawlins, 1991] Foundations of Genetic Algorithms. Edited by Gregory J.E. Rawlins. Morgan Kaufmann Publishers, San Mateo, California, 1991.

[Shapley, 1953] L.S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39:1095- 1100, 1953.

[Singh, 1992] S.P. Singh. Transfer of learning by composing solutions for elemental sequential tasks. Machine Learning, 8:323--339, 1992.

[Worden, 1996] R.P. Worden An Optimal Yardstick for Cognition. PSYCOLOQUY 7(1) optimal-cognition.1.worden, 1996.