Juergen Schmidhuber (1999) Extracting Predictable Hyperstructure. Psycoloquy: 10(034) Hyperstructure (2)

Volume: 10 (next, prev) Issue: 034 (next, prev) Article: 2 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 10(034): Extracting Predictable Hyperstructure

Commentary on Richardson on Hyperstructure

Juergen Schmidhuber
Corso Elvezia 36
6900 Lugano



Richardson's project is partially realized in previous work on discovery of predictable classifications.


complexity, covariation, features, hypernetwork, hyperstructure, object concepts, receptive field, representation
1. Richardson (1999) correctly states that feature space is inappropriate for identifying what's predictable. In typical environments perfect prediction of input features tends to be impossible because the inputs are never quite the same. This is the precise motivation of a previous article (Schmidhuber & Prelinger, 1993), which describes an unsupervised system that learns to classify patterns so that the classifications are predictable while still being as specific as possible. The approach can be related Becker and Hinton's IMAX (1992).

2. Consider the following example: Hearing the first two words of a sentence "Henrietta eats..." allows you to infer that the third word probably indicates something to eat but you cannot tell what. The class of the third word corresponds to "hyperstructure" (to use the term coined by Richardson) that is predictable from the previous words -- the particular instance of the class is not. The class "food" is not only predictable but also nontrivial and specific in the sense that it does not include everything: "John," for instance, is not an instance of "food."

3. Unsupervised discovery of predictable classifications can be achieved through two neural networks T1 and T2. With a given pair of input patterns, T1 sees the first, T2 the second. In the example above, T1 may see a representation of the words "Henrietta eats," while T2 may see a representation of the word "vegetables." T2's task is to classify its input. T1's task is to predict T2's output. There are two conflicting goals which in general are not simultaneously satisfiable: (A) Predictions should match the corresponding classifications. (B) The classifications should be discriminative. The trade-off between (A) and (B) is expressed by means of two opposing costs. This can be done in more than one reasonable way (Schmidhuber & Prelinger, 1993). The total cost is minimized by gradient descent. This forces the predictions and classifications to be more like each other, while at the same time forcing the classifications not to be too general but to tell something about the current input. The procedure is unsupervised in the sense that no teacher is required to tell T2 how to classify its inputs.

4. Several experiments show that the system indeed automatically develops internal representations of predictable "hyperstructure" as opposed to traditional feature detectors. Hence, at least this part of Richardson's project has already found an exemplary neural implementation.


Becker, S. and Hinton, G.E. (1992). A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355:161-163.

Richardson, K. (1999) Hyperstructure in Brain and Cognition. PSYCOLOQUY 10(031). ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1999.volume.10/ psyc.99.10.031.hyperstructure.1.richardson http://www.cogsci.soton.ac.uk/cgi/psyc/newpsy?10.031

Schmidhuber, J. and Prelinger, D. (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. ftp://ftp.idsia.ch/pub/juergen/predmax.ps.gz

Volume: 10 (next, prev) Issue: 034 (next, prev) Article: 2 (next prev first) Alternate versions: ASCII Summary