Bruce Bridgeman (1998) Models and Theories of Cognition are Algorithms. Psycoloquy: 9(22) Connectionist Explanation (19)

Commentary on Green on Connectionist-Explanation

Department of Psychology

University of California

Santa Cruz, CA 95064

USA

bruceb@cats.ucsc.edu

PDP models (sometimes misnamed "connectionist") solve computational problems with a family of algorithms, but changeable weights between their connections mean that the details of their algorithms are subject to change. Thus they do not fulfill the requirement that a model must specify its algorithm for solving a computational problem, or that it must model real data and fail to model false data. Other models use distributed coding but retain homeomorphism and explicit algorithms. An example uses a lateral inhibitory network with fixed weights to model visual masking and sensory memory.

1. Parallel distributed processing (PDP), when it appeared in the 1980s, promised to solve many of the problems with earlier models of brain function. Most earlier models required literal anatomical connections from one node to another to carry information around in the brain. Such models were successors to Pavlov's connectionistic models, so named because the Pavlovian models consisted of connections from one brain center to another. In Pavlov's case the connections originated in sensory centers and carried conditioned stimuli to motor centers to drive behavior. It turned out that such models were inadequate when expanded to meet human capabilities: too much structure was required, in too many connections. The algorithms that these models instantiated were clear, but the execution became untenable because the models could not become sufficiently flexible. Parallel distributed processing appeared as an alternative to connectionism, avoiding the exponential multiplication of connections that the connectionist models required. The solution was neo-connectionistic models, with distributed processing substituting for the myriad of specific connections that connectionism required. (Unfortunately, those who did not know this history soon dropped the prefix "neo," so that the name came to be used for exactly the opposite of its historical meaning and the Pavlovian models were left with no name at all.)

2. The success of PDP models was bought at a price. A useful brain model should be homeomorphic, i.e. for every component in the model there should be an equivalent component in the brain. This assures that the model fits the physical constraints of the real system. Traditional models, including Pavlovian connectionistic models and the box models of psychologists, are at their best homeomorphic and also algorithmic -- they specify the processing that must take place to get from the model's input to its output. This is the case, of course, only if the modeler writes a mathematical function to fill each box. Grainger and Jacobs (1998) differentiate a weaker test as well, a "localist connectionist" requirement that each processing unit must be assigned a meaningful interpretation.

3. Rosenblatt's (1962) perceptrons were the beginning of distributed processing. In a single-layer perceptron, the algorithm is instantiated in a transparent architecture and an array of weights linking input to output. The algorithm is in both the architecture and the weights. As long as the modeler specifies what the weights are, the algorithm is fixed. A device that recognizes a particular pattern, for example, might have the algorithm "connect every pixel that overlaps the target with a positive weight, and every pixel that does not with a negative weight." In modern PDP models, however, the programs are often able to change their own connections, so that the modeler no longer knows the algorithm by which his net solves a problem. This leads to two difficulties: the algorithms are unknown, and the models are too powerful. At the extreme, studies such as that of Hanson & Burr (1990) examine the nodes of a PDP net as though they were single neurons in an intact brain, examining their receptive fields and analyzing the data statistically. At this point the advantage of a model is lost, for we must now develop a theory that explains the meanings of the activities. We could just as well probe the real system and analyze the activity of its nodes directly. PDP programs at this level become practical machines for solving computational problems, but not models in the scientific sense. In one of the most telling critiques of PDP models, Massaro (1988) shows that popular PDP architectures will converge on empirically derived data sets, but will converge equally well on contrived data sets constructed to violate known properties of human cognition. Any model that behaves like this is of course useless for explaining how human cognition really works.

4. A few models combine distributed processing with instantiation of known algorithms in a homeomorphic architecture. An example is a lateral-inhibition model that has been used to simulate metacontrast masking (Bridgeman, 1971). The approach begins with known features of neural architecture, and investigates the architecture's processing capabilities. Lateral inhibition is ubiquitous in sensory systems. The model is based on the Hartline-Ratliff equation for lateral inhibition, originally developed for the Limulus eye. If r(p) is the firing rate of a neuron p, and r(j) is the rate of a nearby neuron j, the effect of r(j) on r(p) is modeled as

r(p) = e(p) - k(p,j)[r(j) - min(p,j)] (1)

where e(p) is the excitatory sensory input to neuron p and k(p,j) is a physiologically derived inhibitory coefficient with a value between 0 and 1 that describes the effect of j on p. The variable min(p,j) is the corresponding threshold of inhibition, and is also empirically derived. This equation is applied iteratively for the influence of each neuron in the net on every other neuron. In practice, inhibitory coefficients k are set to 0 for neurons that are more than 3 neurons distant from neuron p in a 1-dimensional network. Thus each neuron is inhibited by its 3 nearest neighbors on either side.

5. The output of the modeled network is an array of activations of simulated neurons. The network undergoes another interation of equation (1) for every 30 msec of simulated time, resulting in a series of array states. To see how accurately the net detects a given state, the series of array states for the simulated run is compared by correlation to a stored series of array states from a previous run. Simulating masking, for example, a target and mask are given as inputs e(p) to the array at the corresponding iterations of inhibition, and the resulting series of array states is compared to that for the target or the mask given alone. Comparison is also a distributed process, with a correlation measure

r(t) = sum(p=1,n)[r1(p)-sum(r1)/n][r2(p)-sum(r2)/n]/ns(r1)s(r2) (2)

where r(t) is the correlation at time t, r1 and r2 are the first and second runs of the simulation, and s is the respective standard deviation. Thus the model has distributed coding without a parameter count that approaches the number of connections, because the weights are unchanging. And the algorithms are explicit. The architecture is homeomorphic, and therefore physiologically plausible. The model has been successful at simulating a number of conditions of metacontrast masking and sensory memory (Bridgeman, 1978) and can also handle unusual cases such as masking with simultaneous target and mask onset but asynchronous offset (Bischof & Di Lollo, 1995).

6. This model illustrates a family of models and also a philosophy of modeling. Models such as this one are more limited in their capabilities than PDP models, but have greater scientific value because they test the properties of specified algorithms and because they instantiate known neuroanatomy. They overcome many of the objections of Green (1998) about PDP models. In fact one could view the family of PDP models as instantiating a family of related algorithms, but it is the algorithms that define the models and not the instantiations.

Bischof, W. F. & Di Lollo, V. (1995) Motion and metacontrast with simultaneous onset of stimuli. Journal of the Optical Society of America A 12: 1623-1636.

Bridgeman, B. (1971) Metacontrast and lateral inhibition. Psychological Review 78: 528-539.

Bridgeman, B. (1978) Distributed coding applied to simulations of iconic storage and metacontrast. Bulletin of Mathematical Biology 40: 605-623.

Grainger, J., & Jacobs, A. M. (1998) Localist connectionism fits the bill. ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psycoloquy.98.9.10.connectionist-explanation.7.grainger

Green, C.D. (1998) Are Connectionist Models Theories of Cognition? PSYCOLOQUY 9 (4) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psyc.98.9.04.connectionist-explanation.1.green

Hanson, S.J., & Burr, D.J. (1990) What connectionist models learn: Learning and representation in connectionist networks. Behavioral and Brain Sciences 13: 511-518.

Massaro, D.W. (1988) Some criticisms of connectionist models of human performance. Journal of Memory and Language 27: 213-234.

Rosenblatt, F. (1962) Principles of neurodynamics. NY: Spartan