Sung-Bae Cho (1994) Exploiting Modularity in Neural Networks. Psycoloquy: 5(61) Categorization (12)

Volume: 5 (next, prev) Issue: 61 (next, prev) Article: 12 (next prev first) Alternate versions: ASCII Summary

Topic:

PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).

Psycoloquy 5(61): Exploiting Modularity in Neural Networks

EXPLOITING MODULARITY IN NEURAL NETWORKS
Book Review of Murre on Categorization

Sung-Bae Cho
ATR Human Information Processing Research Laboratories
2-2 Hikaridai, Seika-cho, Soraku-gun,
Kyoto 619-02, Japan
and
KAIST Center for Artificial Intelligence Research
373-1 Koosung-dong, Yoosung-ku,
Taejeon 305-701, Korea

sbcho@hip.atr.co.jp sbcho@gorai.kaist.ac.kr

Abstract

In his intriguing book, Jacob Murre (1992) presents a brilliant work on modular neural networks for learning and categorization. Beyond simply proposing the model, he considers its hardware implementation issues and some practical applications. Unfortunately, however, the applications he suggests as practical turn out to be uncompetitive with state-of-the-art models. In this review, I demur slightly on two points, scalability and practicality, and introduce some conventional techniques to overcome these problems.

Keywords

Neural networks, neurobiology, psychology engineering, CALM, Categorizing And Learning Module, neurocomputers, catastrophic interference, genetic algorithms.

I. INTRODUCTION

1. The question of how biological neural networks came to achieve their extraordinary power is of great interest to those in sciences such as psychology, biology, philosophy, neuroscience, and computer science. Nevertheless, fields like artificial neural networks have made only modest progress toward providing illumination on this topic. As a computer scientist, I looked to Jacob Murre's (1992) book with the hope and expectation of gaining major insights that had failed to come from the contributors in my own area of effort. My disappointment, then, may stem from the level of my expectations. In summary, the book lacked the mechanisms necessary to construct structures with a large number of modules, and it did not provide enough examples of practical applications.

II. SCALABILITY

2. The proposed network is called CALM (Categorizing and Learning Module), which consists of three types of elements with mutually exclusive functions. The internal structure of the module is inspired by the neocortical minicolumn. The author suggests, based on analogy with columns in the cortex, that 30 or fewer nodes per module are desirable. From this the author offers us a basic internal mechanism of the module but nothing, unfortunately, about how to construct a large- scale neural network with such modules. The initial architecture to determine what can and cannot be learned is just given by the human designer, which leads to small scale networks. Previous studies indicate that our brain consists of a large number of repeating modules.

3. As the author determined the basic structure of the module by reverse engineering from the brain, I believe the overall modular structure should also be determined by reverse engineering, something like evolution. We know that the process of evolution, in millions of years, gradually resulted in the increased complexity of the brain. The author has tried to select the optimal structure using the genetic algorithm, but there is a need for a more fundamental approach to accommodate the concept of evolution in the design of modular structure. For instance, the brain builder group at ATR is working to build an artificial brain which contains thousands of interconnected artificial neural network modules by simulating evolution (de Garis, 1994).

III. PRACTICALITY

4. Murre argues throughout the book that modularity is important in overcoming many of the problems and limitations of current neural networks. Having illustrated the basic ideas underlying CALM, the author presents handwritten digit recognition as a practical application. Unfortunately, what Murre describes as a practical application turns out to be uncompetitive with state-of-the-art models, thus failing to prove that this modularity, when used to construct artificial neural networks, will result in better performance.

5. In the meantime, many practical models utilizing modularity in their structure by combining multiple networks have already been proposed (Hansen et al., 1990; Scofield et al., 1991; Jacobs et al., 1991; Wolpert, 1992; Cho et al., 1995). One of the key issues of this approach concerns how to combine the results of the various networks to give the best estimate of optimal results. There are a number of possible schemes for automatically optimizing the choice of individual networks and combining architectures.

6. A straightforward approach is to decompose a complex problem into several manageable ones and to use different subnetworks combined via a gating network that decides which of the subnetworks should be used for each case. Waibel (1988) has described a system of this kind that can be used when the decomposition into subtasks is known prior to training, and Jacobs et al. (1991) have also proposed a supervised learning procedure for systems composed of several networks, each of which learns to handle a subset of the complete set of training instances. The subnetworks are local in the sense that the weights in one network are decoupled from the weights in other subnetworks. However, there is still some indirect coupling because if some other network changes its weights, it may cause the gating network to alter the responsibilities that get assigned to the subnetworks.

7. An alternative method is to independently generate a number of networks as possible generalizers and to use all of them to obtain robust output. While a usual scheme chooses one best network from amongst the set of candidate networks based on a winner-takes-all strategy, this approach keeps multiple networks and runs them all with an appropriate collective decision strategy. This is different from the aforementioned "adaptive mixtures of local experts" (Jacobs, 1991), in the sense that here networks do not decompose the task but learn the same task globally with different points of view. A general result from the previous work is that averaging several networks improves generalization performance for the mean squared error.

IV. CONCLUSION

8. The contribution of Murre's book to neural networks may lie in its proposing an architecture that gives us definite milestones to work toward, milestones that model the large-scale neural system. It certainly places a lot of responsibility on those working in the artificial neural network field, as they may be shaping the future evolution of the artificial brain. [Software called CALM-Shell to run on a Macintosh computer with the Thread Manager is now available from Adriaan Tijsseling: adriaan@phil.ruu.nl]

REFERENCES

Cho, S.-B. & Kim, J.H. (1995) Multiple Network Fusion Using Fuzzy Logic. IEEE Trans. Neural Networks. 6. (In Press).

de Garis, H. (1994) Project Report: An Artificial Brain (ATR's CAM-Brain Project Aims to Build/Evolve a Artificial Brain with a Million Neural Net Modules Inside a Trillion Cell Cellular Automata Machine). New Generation Computing. 12: 215-221. Ohmsha Ltd, and Springer Verlag.

Hansen, L.K. & Salamon, P. (1990) Neural Network Ensembles. IEEE Trans. Pattern Analysis and Machine Intelligence. 12: 993-1001.

Jacobs, R.A., Jordan, M.I., Nowlan S.J. & Hinton, G.E. (1991) Adaptive Mixtures of Local Experts. Neural Computation. 3: 79-87.

Murre, J.M.J. (1992) Learning and Categorization in Modular Neural Networks. UK: Harvester/Wheatsheaf; US: Erlbaum.

Murre, J.M.J. (1992) Precis of: Learning and Categorization in Modular Neural Networks. PSYCOLOQUY 3(68) categorization.1.murre.

Scofield, C., Kenton, L. & Chang, J. (1991) Multiple Neural Net Architectures for Character Recognition. In Proc. Compcon. 487-491. San Francisco, CA, IEEE Computer Society Press.

Waibel, A. (1988) Connectionist Glue: Modular Design of Neural Speech Systems. In, Proc. 1988 Connectionist Models Summer School. 417-425.

Wolpert, D. (1992) Stacked Generalization. Neural Networks. 5: 241-259.

Volume: 5 (next, prev) Issue: 61 (next, prev) Article: 12 (next prev first) Alternate versions: ASCII Summary