Harnad, S. (1992) Connecting Object to Symbol in Modeling Cognition. In: A. Clark and R. Lutz (Eds) Connectionism in Context Springer Verlag, pp. 75-90.

Connecting Object to Symbol in Modeling Cognition

Stevan Harnad
Department of Psychology
Princeton University
Princeton NJ 08544


Connectionism and computationalism are currently vying for hegemony in cognitive modeling. At first glance the opposition seems incoherent, because connectionism is itself computational, but the form of computationalism that has been the prime candidate for encoding the "language of thought" has been symbolic computationalism (Dietrich 1990, Fodor 1975, Harnad 1990c; Newell 1980; Pylyshyn 1984), whereas connectionism is nonsymbolic (Fodor & Pylyshyn 1988, or, as some have hopefully dubbed it, "subsymbolic" Smolensky 1988). This paper will examine what is and is not a symbol system. A hybrid nonsymbolic/symbolic system will be sketched in which the meanings of the symbols are grounded bottom-up in the system's capacity to discriminate and identify the objects they refer to. Neural nets are one possible mechanism for learning the invariants in the analog sensory projection on which successful categorization is based. "Categorical perception" (Harnad 1987a), in which similarity space is "warped" in the service of categorization, turns out to be exhibited by both people and nets, and may mediate the constraints exerted by the analog world of objects on the formal world of symbols.

Symbol Systems

A symbol system is an abstract, formal object that is capable of being implemented as a real physical object. Formally, it is a set of arbitrary symbol "tokens" (e.g., marks on paper) together with rules (notational conventions and algorithms) for manipulating them purely on the basis of their shapes, i.e., purely syntactically. The crucial property that makes symbol systems interesting is that the symbols can be given a systematic semantic interpretation; they can be consistently and coherently taken to mean something.

For example, the words of a natural language, together with the syntactic rules for combining them into grammatically correct utterances, constitute a symbol system, and the words and utterances of a language can be interpreted as meaning something (e.g., what this very sentence -- a mere string of symbols -- means); they can be interpreted as referring to and describing the objects, events and states of affairs that people talk about. It is important to note that the "shape" of the words is arbitrary in relation to what they mean. The acoustic or visual shapes of the words "cat," "mat," and "the cat is on the mat" are arbitrary in relation to the objects and states of affairs that they can be systematically interpreted as referring to. Similarly, in the formal notational system for axiomatized arithmetic, the shape of the symbols "0" and its successor "0'" (or "1") and the shape of "+" are all arbitrary in relation to the quantities and properties that they can be systematically interpreted as denoting. The same is true of the symbols in a C or Lisp computer program.

As Fodor & Pylyshyn (1988) have pointed out, it is the property of systematicity that we are really interested in when we devise and use symbol systems. We are not interested in uninterpretable symbol systems, or in symbol systems whose interpretation is arbitrary or merely fanciful, like the astrological interpretation of the configurations of the heavenly bodies and their relation to our personal fortunes. A symbol system, including all its parts, and all their ruleful combinations, must be able to bear the weight of a systematic interpretation. This is why it is so hard to decipher unknown languages or codes: For semantic interpretability is a very exacting constraint; few are the systems that can bear the weight of a systematic interpretation, and even then they do not wear their interpretations on their sleeves.

Consider formal "duals" in mathematics. It is known that in propositional logic, for example, there is a way of formulating all the true propositions and deductions using only the connectives "and," and "not." This can also be done using only "or" and "not." The and/not and or/not systems are called duals. There is a systematic way one can be interpreted in terms of the other. As long as the syntactic rules are obeyed, their interpretations can be systematically "swapped." This is not true, however, for an arbitrary pair of symbols (e.g., or/if and then/not), in fact, the number of viable formal duals is extremely small. This means that if anything else is swapped or cross-interpreted in a symbol system, a coherent interpretation is unlikely to exist.

Systematicity confers many benefits. If we have a ruleful symbol system that can be systematically interpreted in a certain way then ruleful syntactic manipulations of the symbols will preserve that systematic relation to what they can be interpreted as meaning. This is why numerical calculators, complicated computer programs, mathematical models, and natural language itself are so useful.

The Symbolic Theory of Mind

As described so far, symbol systems are only a tool. How did they become cognitive models, candidates for what might really be going on in our heads to produce mental states? The first factor was clearly the success of artificial intelligence (AI): Here were symbol systems doing intelligent things -- playing chess, solving problems, describing scenes, engaging in conversation -- in short, doing the kinds of things people can do. And they were doing these things in a way that we could explain and understand, which was something this century's psychologists and neuroscientists had not yet been able to do in the case of real people's cognitive capacities.

So it was natural to consider symbol systems as possible explanations of how the mind worked. Beyond that, studies of the foundations of mathematics had shown that formal symbol systems were very powerful indeed, perhaps all-powerful: Just about every possible physical system and process could be seen as formally equivalent to a system of symbols and syntactic rules for manipulating them (Kleene 1969).

And finally, there was the implementation-independence of the formal, syntactic level of description of a system: All implementations of a symbol system are formally equivalent, no matter how radically they may differ in their physical realizations. So if mental states were really just certain symbolic states, implemented physically, then that persistent difficulty we all have in equating the mental with the physical (otherwise known as the mind/body problem) would be explained: What made a state mental would not be any of the specifics of its physical realization (the body), but only that it was the implementation of the right symbol system. Then all further questions about the mind would really just be questions about the formal properties of that symbol system.

Unfortunately, this symbolic theory of mind fell upon hard times. Symbolic AI did not go much beyond producing "toy" models that could only do a tiny fraction of what people could do, with no principled way of "scaling up" to the rest, to exhibit our full cognitive capacity. The symbols and rules also seemed too ad hoc; they did not feel as if they were capturing what our minds were made of, even though no one really had a better candidate. Purely negative critiques began to appear. Searle (1980a), for example, argued that symbols and systematicity couldn't be enough to capture the mind because even if a symbol system could pass the Turing Test (Turing 1964) in Chinese (i.e., if symbolic interactions with it were systematically interpretable as -- and indistinguishable from -- a lifelong correspondence with a real Chinese pen-pal) it could be shown that the system did not really understand Chinese (because Searle himself could, by memorizing and following all the symbol-manipulation rules, become an implementation of the very same system without understanding any Chinese). Penrose (1989, 1990) argued that the mind could not be just a symbol system because it could demonstrably do things that rule-governed symbol systems could not do (see also Davis 1958, Lucas 1961).

What could it be that these all-powerful symbol systems, so systematically amenable to a mentalistic interpretation, lacked, and what kind of system might have it? Searle (1980b) suggested that what they lacked was "intrinsic meaning": the meanings of the symbols in a symbol system, be they ever so systematically interpretable, are merely being projected onto them by their interpreters, just as in the case of the meanings of the words in a book. The symbols of a book or computer do not mean anything to the book or computer, because books and computers are not the kinds of things that anything means anything to. There's nobody home in there. Searle urged instead that we study the physical properties of the brain, because we know that brains are the kinds of systems in which somebody is home for symbols to mean something to, hence they are the systems with the right "causal powers" for implementing mental states with intrinsic meanings. Penrose, on the other hand, recommended turning to basic physics rather than the brain, suggesting that quantum computers might provide the requisite nonsymbolic, nonalgorithmic power for mental states.

The Symbol Grounding Problem

There is another way of looking at the shortcomings of symbol systems, however, that does not send us quite so far afield. The failure of AI can also be understood as the impossibility of grounding symbol meanings within a pure symbol system. The "symbol grounding problem" (Harnad 1990a) is analogous to the problem of trying to arrive at the meaning of Chinese symbols on the basis of a Chinese/Chinese dictionary alone, without knowing any Chinese: All one could do would be to pass systematically from meaningless definiens to meaningless definiendum, and to each equally meaningless definiens within each definiendum. Such a search would be systematic, and it would even be systematically interpretable -- to someone who already understood Chinese -- but it could never come to a halt on the intrinsic meaning of a symbol. The searcher would be trapped in an endless circle of meaningless symbols (Harnad 1990b) systematically manipulated on the basis of their arbitrary shapes. Without the projected interpretation, the system, be it ever so systematically interpretable, is hanging from a sky hook; it is ungrounded. Hence, on pain of infinite regress, the system in our heads that is capable of projecting meanings onto symbol systems cannot itself be merely a symbol system. Our symbol meanings must be grounded nonsymbolically.

What nonsymbolic means exist for grounding the meanings of symbols? In recommending the causal powers of the brain, Searle unfortunately did not point out which of the brain's causal properties was relevant to symbol grounding; its specific gravity, for example, is not likely to be relevant. Penrose's suggestion to turn to quantum properties is not helpful either, since the brain does not differ significantly from the heart or the kidneys in its quantum properties. Is there any other promising nonsymbolic candidate to help solve the symbol grounding problem?

Neural Nets

Connectionist systems, hopefully dubbed "neural nets," have lately been put forward as nonsymbolic candidates for a much more ambitious task than just helping to ground symbol systems: Perhaps they could do all the heavy work of cognition, replacing symbol systems altogether. Mental states, instead of being symbolic states in a physically implemented symbol system, would instead be dynamic states of activity in parallel, distributed systems of interconnected units (McClelland & Rumelhart 1986). Like AI's symbol systems, connectionist systems have been shown to be capable of generating a good deal of intelligent performance capacity, especially learning and pattern recognition. Other properties recommending them are that they are more brainlike than computers and that, not being symbolic, they do not suffer from the symbol grounding problem.

Unfortunately, connectionism's strengths are also its weaknesses (Minsky 1969; Harnad 1990d): It is not yet obvious that "neural nets" are brainlike in the right way. The model of a set of interconnected units dynamically adjusting their connection strengths is based more on our ignorance of how the brain works than on any real functional understanding. It is quite possible, for example, that local field effects from graded postsynaptic potentials are where the real cognitive action is, rather than the action potentials that jump from neuron to neuron; or the real functional level might be biochemical rather than neuronal. Or the symbolists could conceivably be right that neuronal networks are merely one of many possible architectures for implementing symbol systems. Or connectionism could turn out to be just a family of learning algorithms that are realizable in many different ways, parallel/distributed implementations not being essential to them (this would leave connectionism just as vulnerable to Searle's argument as symbol systems). Nor is the power of these algorithms to generate our full human cognitive capacity any more firmly established than symbolic AI's. Moreover, if neural nets do turn out to be essentially nonsymbolic, then that may still turn out to be more of a liability than an asset. As Fodor & Pylyshyn have pointed out, systematic semantic interpretability looks like a desirable property for a cognitive system to have; so if neural nets don't have it, they must somehow acquire it -- and then they too, having become symbol systems, would have to face the symbol grounding problem.

Transducers and Analog Transformations

If we remain agnostic about connectionism's symbolic/nonsymbolic status, are there other candidate structures and processes, unequivocally nonsymbolic ones, for grounding symbol systems? A careful reading of Searle and Penrose already suggests that some kinds of system are immune to their criticisms: Searle's argument depends critically on the implementation-independence of the symbolic level of function, for Searle himself can always implement a symbol system and show that it lacks whatever mental property its more ambiguous computer incarnation had allowed us to systematically project onto it. I have called this "Searle's Periscope" on the other-minds problem (Harnad 1991). But an optical transducer is already a kind of system that is impenetrable to Searle's Periscope, for Searle cannot implement a transducer without being one, and the one he is already, sees (Harnad 1989). Similarly, Penrose's arguments about what can or cannot be accomplished by algorithms are already moot for analog systems (and perfectly Newtonian ones at that).

So perhaps transducer/effector and other analog structures and processes can help ground symbols -- but first we must lay to rest one trivialized version of this proposal: Proponents of symbolic AI had always maintained that, even if the symbolic theory of mind was correct, mental states had to be implemented, and that requires a physical, hence analog, device to implement the symbol system. Moreover, in order to exhibit our full performance capacity, the symbol system would require transducers and effectors to interact with the real world. Nevertheless, apart from the physical implementation of the symbol system and the transduction of its sensorimotor input and output, the symbol system itself would be doing the real cognitive work. The transduction would be trivial and the particulars of the hardware implementation would, as usual, be irrelevant. Note that this view is highly modular: The real work is done by the symbolic module, with the rest being just implementation or I/O.[1] And it is also homuncular, in that the mental states are attributed to the symbolic module that is doing the real work.[2] The hybrid approach to grounding recommended here is nonmodular, in that it cannot be decomposed into autonomous symbolic and nonsymbolic components, and it is not homuncular, in that you have to be the tranducer/effector and analog structures and processes (as well as the symbolic ones) in order to be a mind: It is not that the mind receives the transducer/effector or analog activity (or, for that matter, the symbolic activity) as data. If the mind is grounded this way then it just is the activity of those structures and processes.

So we are really talking about grounding symbols and symbolic capacity in the structures and processes that underlie robotic capacity. The standard verbal version of the Turing Test, being purely symbolic, is subject to the symbol grounding problem: All the pen-pal interactions with the candidate are systematically interpretable by us as meaning something to us, but what would make them systematically meaningful to the candidate as well? The stronger "Total Turing Test" (Harnad 1991) requires the candidate to be indistinguishable from ourselves not only in its symbolic capacities, but also in its robotic capacities, which requires the two sets of capacities to be systematically congruent with one another. The robotic capacities thereby fix the meanings of the symbols. This of course still leaves open the possibility that there is no one home in the robot, but then that is no longer the symbol grounding problem but the other-minds problem (and, apart from Searle's Periscope in the special case of the symbolic theory of mind, there is no solution to that one; Harnad 1984, 1991).

How is one to ground symbolic capacity in robotic capacity? To a first approximation, our robotic capacities consist of our ability to discriminate, identify and manipulate the objects, events and states of affairs in our world. Our symbolic capacities consist of our ability to identify, describe, and respond coherently to descriptions of the objects, events and states of affairs in our world. Notice that the capacity to "identify" figures in both domains. It may also be the key to symbol grounding.

Robotic Capacities: Discrimination and Identification

Discrimination and identification are both perceptual capacities. Discrimination is a relative judgment; identification is an absolute judgment. To be able to discriminate a pair of inputs is to be able to say whether they are the same or different, and, if different, how similar they are (Tversky 1977). To be able to identify an object is to be able to assign to it a unique, arbitrary name. For a robot, to identify is to categorize, because every presentation of an object is unique (if for no other reason than the unique point in time when it is encountered). To be able to categorize is hence to be able to find what in the sensory projection of an object is invariant across presentations and provides a reliable basis for correctly identifying it from among the objects whose sensory projections it could be confused with. What is confusable with what depends on the sample of alternatives that the robot has encountered (Harnad 1987b, 1987c).

Note that if our robotic capacity consisted of nothing but visual discrimination -- i.e., all we ever had to do was make similarity judgments about pairs of visual inputs -- then a very simple, purely analog mechanism could accomplish it all: An analog of the sensory projection of one object could be superimposed on the sensory projection of the other, and the judgment would be "same" if they were congruent, "different" if not, with their degree of similarity signaled by their degree of congruity. Of course, three-dimensional space already complicates this picture, because the same 3-D object might have many 2-D projections. But the discrimination of 3-D objects could also be accomplished by an analog/congruity mechanism, one allowing more complicated analog projections and transformations, such as recovering 3-D shape from 2-D invariants (Ullman 1980) and internal rotation of 3-D shapes (Shepard & Cooper 1982). Let us call such analog structures and processes "iconic representations." If our robotic capacity consisted of nothing but discrimination, iconic representations would suffice to accomplish it.

Unfortunately, our robotic capacity consists of more than just relative discrimination. We are also capable of sorting and labeling objects on the basis of their sensory projections. Iconic representations cannot do this for us, except in trivial cases, because they are unique to every sensory presentation, and categorization depends on selective detection of what is invariant within a given category, relative to other categories it could be confused with. Now some sensory invariants are innately detected by our perceptual systems; these were presumably "learned" through evolution. Other sensory invariants we must learn to detect by trial and error or instruction during our lifetimes (Gibson 1969). In either case, it is clear that a learning mechanism is needed that can extract invariant features from sensory projections on the basis of supervised learning -- learning in which there is a right or wrong of the matter, and in which the consequences of sorting and labeling incorrectly are sensed somehow, so as to guide the error-correction process.

Let us leave this sensory category learning mechanism unspecified for the moment and simply describe its effects: It somehow reduces iconic representations to the invariant sensory features that will subserve successful categorization. Let us call such selectively reduced representations "categorical representations." Aside from making it possible to sort objects into categories on the basis of their sensory projections, this mechanism also allows us to assign a unique, arbitrary name to each category. Let us call such names "elementary symbols," and certain strings of them, interpretable as propositions about category membership, grounded "symbolic representations." Here is how grounding would work:

Suppose a robot was capable of discriminating objects Turing-indistinguishably from the way we do, on the basis of its iconic representations. Suppose also that it could identify object categories Turing-indistinguishably from the way we do, on the basis of its categorical representations. In particular, suppose it could discriminate and identify sensory presentations of horses on the basis of its iconic and categorical representations of horses and it could also discriminate and identify "stripes" on the basis of its iconic and categorical representations of stripes. Now note that such a robot could in principle decode and use the symbol string: "zebra = horse & stripes" even though "zebra" was a previously undefined symbol and unencountered object.[3] Not only would "zebra" inherit the grounding of "horse" and "stripes," but in principle, a robot that had received and stored such a symbol string could correctly identify zebras as of its very first encounter with them, without the need of trial and error learning, by using the grounding of the constituents of its symbolic representation.[4]

Philosophical Objections to Bottom-Up Grounding of Concrete and Abstract Categories

Now philosophers are fond of raising 300 year old objections to this kind of bottom-up proposal. It is supposed to be doomed to failure for the same reason that the entire empiricist program of grounding thinking in sense experience failed -- because, in a nutshell, most abstract categories (e.g., goodness, truth, beauty, even games) do not have any shared invariants, sensory or otherwise. Moreover, a zebra is not a striped horse!

Perhaps this is not the place to fight this particular battle, but let it be noted that the feasibility of grounding symbols in robotic capacities has really never yet been tested. Philosophers have concluded that sensory grounding was a dead end from the vantage point of their armchairs, based on introspecting about the definitions and sensory properties of abstract categories. Wittgenstein (1953), for example, concluded that because he could find no common properties among games, such invariants therefore did not exist, and that we therefore categorize games on the basis of vague "family resemblances."

The picture is quite different if one adopts a roboticist's stance (and, paradoxically, this can already be discerned from the armchair), for the roboticist asks: What is it that people can actually sort and label, reliably and "correctly," as "games" and "nongames," and how might they be accomplishing that? We can already eliminate the cases the people cannot sort, or cannot agree upon. We can forgot about what a game is "really," sub specie aeternitatis: A roboticist is just modeling performance capacity, not ontology. But among those cases that people can and do sort and label reliably and "correctly," the roboticist is quite justified in assuming that either the success is grounded directly in sensory invariants (as in the hypothetical case of "horse") or it is recursively grounded in labels that are grounded in labels, etc., that are directly grounded (as in the case of "zebra"). Otherwise the robot's success in sorting and labeling would be completely inexplicable -- for it certainly could not be hanging from a skyhook of ungrounded symbolic representations.[5]

So, on the assumption that the viability of this bottom-up robotic grounding scheme is an empirical question rather than an a priori one that has already been decided, let us examine more closely how it might be implemented and tested: The crucial component that is still missing is the learning mechanism that will find the invariants in the sensory projections of objects that will allow the robot to identify what category they belong to. Here is a function for which neural nets are a natural candidate. Whether or not they are brainlike, whether or not they are symbolic, and whether or not they have the power to do other things entirely on their own, neural nets seem well-suited to the task of sensory category learning. Whether they will have sufficient learning power to accomplish human-scale category learning is of course likewise an empirical question, but this certainly seems worth exploring.

Categorical Perception (CP) and Category Learning

Modeling always begins with a toy task, and one of the simplest possible category learning problems is to split a one dimensional continuum into two or more categories. The nervous system does this innately for several sensory dimensions, such as color, voicing, and acoustic formant transitions (Berlin & Kay 1969; Boynton 1979; Harnad 1987a). Physically, each of these is a one-dimensional continuum, but our brains subdivide them into distinct identifiable subregions (red, orange, yellow, etc.; /ba/ (voiced), /pa/ (voiceless); and /ba/, /da/, /ga/, respectively). Associated with this segmentation is a striking interaction between our seemingly independent capacity to discriminate and to identify, an interaction that has been called "categorical perception" (CP): Normally, equal sized (logarithmic) differences along a one-dimensional physical continuum are perceived as being equal psychologically: According to Weber's Law, the relation between the physical magnitude of stimulation and the psychological magnitude of sensation is homogenous and log-linear. But in the case of CP, there is inhomegeneity and Weber's Law does not hold along the continuum: There is a compression of discriminability within categories and a dilation of discriminability between categories: The continuum has somehow been warped as a function of where the category boundaries are. Between-category differences "look" bigger than within-category differences, indeed they are often perceived as qualitative rather than quantitative.

The one-dimensional CP effects studied so far have mostly been innate ones (though there is evidence for modulatory effects of learning). There have also been reports of learned CP for pitch categories (Siegel & Siegel 1977) and for sectored circles (Lane 1965; cf. Lawrence 1950, Gibson 1969). In both cases it is assumed that the dramatic "warping" of similarity space in the service of categorization might be performing some useful function, perhaps providing compact, bounded "chunks" that can then be combined into higher-order categories (Miller 1956). Just as the "just-noticeable-difference" represents the resolution grain of our discriminative capacity, bounded CP categories may represent the resolution grain of our identification capacity.

Neural Nets and CP

Neural nets were tested to determine whether they produce CP effects as a consequence of category learning, and if so, whether they might give us an idea of what function CP effects may be performing (Harnad, Hanson & Lubin 1991). Backpropogation nets (McClelland & Rumelhart 1986) were trained to split a continuum of "line lengths" into two categories ("short" and "long"). There were 8 lines, increasing in length in equal-sized increments from the shortest to the longest, with the category boundary between lines 4 and 5. The lines were represented 6 different ways: (1) was a discrete place code (e.g., a line of length 4 would be 00010000); (2) was a discrete thermometer code (11110000); (3) and (4) were coarse-coded (pseudocontinuous) versions of (1) and (2), and (5) and (6) added lateral inhibition. Nets with 8 input units and 8 output units were trained with various numbers of hidden units (2 - 12) to perform autoassociation (Cottrell, Munro & Zipser 1987; Elman & Zipser 1987; Hanson & Burr 1990), i.e., to produce output identical to their input. The pairwise distances between the lines in hidden-unit space (i.e., the euclidean distance between the vectors formed by the activations of the hidden units by each of the lines) for the trained nets were then taken as the precategorization baseline. The nets were then trained, with supervision, to categorize the lines using one additional output unit to signal "short" or "long." The postcategorization distances between all possible pairs of the 8 lines were then compared with their precategorization distances. For all six representations, within- category distances were compressed and between-category distances were increased after categorization training -- a sizeable CP effect (see Figure 1).

-- Figures 1 and 2 about here --

We then looked at the actual course of the evolution of the autoassociation learning and the categorization learning in hidden-unit activation space for 3-hidden-unit nets, where this could be easily visualized as points in the unit cube (Figure 2). During autoassociation, the hidden-unit representations of the 8 lines, beginning from random initial locations prior to learning, expanded towards maximal pairwise separation in the corners and edges of the bounded unit cube (to which they were limited by the logistic function). Separation was maximal with the discrete place code. Both thermometer and coarse coding added some analog constraints, keeping some lines closer to one another than they would have "liked," in order to preserve the analog structure of the representation. This maximal separation (together with the analog constraints) was what the autoassociation net ended with and the categorization net began with.

The CP effects turned out to arise from three factors:

(a) Categorization was successful once "linear separability" was achieved, i.e., when the position of the 8 hidden-unit representations of the "short" and "long" lines in the hypercube was such that they could be separated into their respective categories by a plane: The principal source of the CP effect was the "movement" of the lines from their initial post-autoassociation configurations to a configuration that was linearly separable, because this movement consisted largely of within-category compression and between-category separation.

(b) In addition, the "repulsive" force of the separating plane was inversely proportional to the distance of each line-representation from it, i.e., it was maximal at the category boundary.

(c) Finally, where there was maximal separation after autoassociation, as with the discrete place codes, the only way to get categorization was to move some lines closer together than they would have "liked" to be, again resulting in within-category compression and between-category separation.

So here in this toy model of the simplest form of categorization performed by neural nets, CP effects arise as a natural side-effect of the way these particular nets accomplish categorization. Whether the CP effect is universal or peculiar to some kinds of nets (cf. Grossberg 1984), whether the nets' capacity to do simple one-dimensional categorization will scale up to the full multidimensional categorization capacities of human beings, how the grounded labels of these sensory categories are to be combined into strings of symbols that function as propositions about higher-order category membership, and how the nonarbitrary "shape" constraints these symbols inherit from their grounding will affect the functioning of such a hybrid symbol system remain questions for future research. If these results can be generalized, however, the "warping" of analog similarity space may be a significant factor in grounding.

Analog Constraints on Symbols

Recall that the shapes of the symbols in a pure symbol system are arbitrary in relation to what they stand for. The syntactic rules, operating on these arbitrary shapes, are the only constraint on the manipulation of the symbols. In the kind of hybrid system under consideration here, however, there is an additional source of constraint on the symbols and their allowable combinations, and that is the nonarbitrary shape of the categorical representations that are "connected" to the elementary symbols: the sensory invariants that can pick out the object to which the symbol refers on the basis of its sensory projection. The constraint is bidirectional. The analog space of resemblances between objects is warped in the service of categorization -- similarities are enhanced and diminished in order to produce compact, reliable, separable categories. Objects are no longer free to look quite the same after they have been successfully sorted and labeled in a particular way. But symbols are not free to be combined purely on the basis of syntactic rules either. A symbol string must square not only with its syntax, but also with its meaning, i.e., what it, or the elements of which it is composed, are referring to. And what they are referring to is fixed by what they are grounded in, i.e., by the nonarbitrary shapes of the iconic projections of objects, and especially the invariants picked out by the neural net that has accomplished the categorization.

If a grounding scheme like this were successful, it would be incorrect to say that the grounding was the neural net. The grounding includes, inseparably (on pain of reverting to the ungrounded symbolic circle) and nonmodularly, the analog structures and processes that the net "connects" to the symbols and vice-versa, as well as the net itself. And the system that a candidate would have to be in order to have a mind (if this hybrid model captures what it takes to have a mind) would have to include all of the three components. Neither connectionism nor computationalism, according to this proposal, could claim hegemony in modeling cognition, and both would have to share the stage with the crucial contribution of the analog component in connecting mental symbols to the real world of objects to which they refer.

Figure 1.

Pairwise distances between the 8 lines in hidden-unit space (3 hidden units) for the discrete thermometer representation (10000000, 11000000, 11100000, etc.). The effect was substantially the same for the other five input representations. 1a shows the pairwise distances following auto-association alone and 1b shows the difference between auto-association alone and auto-association plus categorization. The polarity of these differences is positive if the interstimulus distance has become smaller (compression) and negative if it has become larger (separation). To visualize within-category and between-category effects more easily, the comparisons have all been ordered as follows: first the one-unit comparisons 1-2, 2-3,... 7-8; then the two-unit comparisons 1-3, 2-4, etc, and so on until the last seven-unit comparison: 7-8. Note that the category boundary is between stimuli 4 and 5, hence all pairs that cross that boundary are between-category comparisons; otherwise they are within-category comparisons. Almost without exception, within-category distances are compressed and between-category distances are expanded by the categorization learning. The interstimulus distances before categorization (auto-association alone) tended to be equal (flat) for the more arbitrary codes (discrete/place, lateral-inhibition/place) and ascending with increasing distance in units for the more iconic representations (thermometer and coarse codes), as in this example. The distance scale is arbitrary; standard errors are shown atop each bar.

Figure 2.

Evolution of the 8 line representations in hidden-unit space that are formed during learning by 3-hidden-unit nets progressing from pre-autoassociation to (b) postautoassociation to (c) postcategorization. The upper three cubes are examples of what happens with the more arbitrary input codings (lateral-inhibition/place) and the lower three with the more analog codings (coarse/thermometer). Each line's hidden-unit representation is displayed as a point in the unit cube, its value on each axis corresponding to the activations of each of the 3 hidden units (the connecting lines, proportional in their darkness to the number of each line, are just to make 3-dimensional visualization easier). The upper three cubes show how the arbitrary lateral-inhibition/place representations evolve during auto-association from their initial random configuration (a) to extreme separation in the corners and edges of the space after auto-association learning (b) and finally to categorization (c), which for this particular net required the movement of lines 6 and 2, before linear separability was accomplished. The corresponding lower three cubes show how the analog factors in the coarse/thermometer input coding constrain the configuration and thus facilitate category separation. Categorical perception effects (within-category compression and between-category separation) as a result of categorization training (see Figure 1) nevertheless also occur for the more analog input codings.


Berlin, B. & Kay, P. (1969) Basic color terms: Their universality and evolution. Berkeley: University of California Press

Boynton, R. M. (1979) Human color vision. New York: Holt, Rinehart, Winston

Cottrell, Munro & Zipser (1987) Image compression by back propagation: an example of extensional programming. ICS Report 8702 Institute for Cognitive Science, UCSD.

Davis, M. (1958) Computability and unsolvability. Manchester: McGraw-Hill.

Dietrich, E. (1990) Computationalism. Social Epistemology 4: 135 - 154.

Elman J. & Zipser D. (1987) Learning the Hidden Structure of Speech. ICS Report 8701 Institute for Cognitive Science, UCSD.

Fodor, J. & Pylyshyn, Z. (1988) Connectionism and cognitive architecture: A critical analysis. Cognition 28: 3 - 71.

Fodor, J. A. (1975) The language of thought New York Thomas Y. Crowell

Fodor, J. A. (1985) Précis of "The Modularity of Mind." Behavioral and Brain Sciences 8: 1 - 42.

Gibson, E. J. (1969) Principles of perceptual learning and development. Engelwood Cliffs NJ: Prentice Hall

Grossberg S.G. (1984) Some Physiological and Pharmacological Correlates of a Developmental, Cognitive, and Motivational Theory. In: Karrer R, Cohen J, and Tueting P, (Eds.), "Brain and Information: Event-Related Potentials." Annals of the New York Academy of Sciences 425: 58-151.

Hanson S.J. & Burr (1990) What connectionist models learn: Learning and Representation in connectionist networks. Behavioral and Brain Sciences 13: 471-518.

Harnad S. (1984) Verifying machines' minds. Contemporary Psychology 29: 389-391.

Harnad, S. (1987a) (Ed.) Categorical Perception: The Groundwork of Cognition. Cambridge University Press.

Harnad, S. (1987b) Category induction and representation. In Harnad 1987a.

Harnad, S. (1987c) Uncomplemented Categories, or, What Is It Like To Be a Bachelor (Presidential Address, 13th Annual Meeting of the Society for Philosophy and Psychology, UCSD, 1987)

Harnad, S. (1989) Minds, machines and Searle. Journal of Experimental and Theoretical Artificial Intelligence. 1: 5-25.

Harnad, S. (1990a) The Symbol Grounding Problem. Physica D 42:335-346.

Harnad S. (1990b) Lost in the hermeneutic hall of mirrors. Journal of Experimental and Theoretical Artificial Intelligence 2: 321 - 327.

Harnad, S. (1990c) Commentary on Dietrich's (1990) "Computationalism." Social Epistemology 4: 167-172.

Harnad, S. (1990d) Symbols and Nets: Cooperation vs. Competition. Review of: S. Pinker and J. Mehler (Eds.) (1988) "Connections and Symbols" Connection Science 2: 257-260.

Harnad, S. (1991) Other Bodies, Other Minds: A Machine Reincarnation of an Old Philosophical Problem. Minds and Machines 1: 43-54

Harnad, S., Hanson, S.J., & Lubin J. (1991) Categorical Perception and the Evolution of Supervised Learning in Neural Nets. Presented at American Association for Artificial Intelligence Symposium on Symbol Grounding: Problem and Practice, Stanford University, March 1991.

Kleene, S. C. (1969) Formalized recursive functionals and formalized realizability." Providence American Mathematical Society.

Lane, H. (1965) The motor theory of speech perception: A critical review. Psychological Review 72: 275 - 309.

Lawrence, D. H. (1950) Acquired distinctiveness of cues: II. Selective association in a constant stimulus situation. Journal of Experimental Psychology 40: 175 - 188.

Lucas, J . (1961) Minds, machines and Gödel. Philosophy 36: 112-117.

McClelland, J. L., Rumelhart, D. E., and the PDP Research Group (1986) Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1. Cambridge MA: MIT/Bradford.

Miller, G. A. (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63: 81 - 97.

Minsky, M. & Papert, S. (1969) Perceptrons: An introduction to computational geometry. Cambridge MA: MIT Press

Newell, A. (1980) Physical Symbol Systems. Cognitive Science 4: 135 - 83.

Penrose, R. (1989) The emperor's new mind. Oxford: Oxford University Press

Penrose, R. (1990) Precis of: "The emperor's new mind." Behavioral and Brain Sciences 13: 643-705.

Pylyshyn, Z. W. (1984) Computation and cognition. Cambridge MA: Bradford Books

Searle, J. R. (1980a) Minds, brains and programs. Behavioral and Brain Sciences 3: 417-424.

Searle, J. R. (1980b) Intrinsic intentionality. Behavioral and Brain Sciences 3: 450-457.

Shepard, R. N. & Cooper, L. A. (1982) Mental images and their transformations. Cambridge: MIT Press/Bradford.

Siegel, J. A. & Siegel, W. (1977) Absolute identification of notes and intervals by musicians. Perception & Psychophysics 21: 143-152.

Smolensky, P. (1988) On the proper treatment of connectionism. Behavioral and Brain Sciences 11: 1 - 74.

Turing, A. M. (1964) Computing machinery and intelligence. In: Minds and machines, A . Anderson (ed.), Engelwood Cliffs NJ: Prentice Hall.

Tversky, A. (1977) Features of similarity. Psychological Review 84: 327 - 352.

Ullman, S. (1980) Against direct perception. Behavioral and Brain Sciences 3: 373 - 415.

Wittgenstein, L. (1953) Philosophical investigations. New York: Macmillan

Zadeh, L. A. (1965) Fuzzy sets. Information & Control 8: 338-353.


1. Fodor's (1985) view of modularity makes a similar partition, but he is not very optimistic that symbolic modeling will capture the workings of the mind's central processor.

2. Another way of putting it is that the mind is really just a symbol system and the rest is just a matter of getting it connected up to the outside world in the right way. The right rejoinder is that getting "connected up in the right way" is what most of cognition is about, and that that is in fact the symbol grounding problem. It is surely noteworthy that if you peel off the sensorimotor parts of the brain you don't have much brain left, and certainly nothing that looks like a homuncular symbol system.

3. I am not actually proposing that "horse" is a ground-level category. I am just suggesting that some things must be ground-level categories; I don't even know how many are needed to form a "basis" for a grounding system. Moreover, I am not even suggesting that the ground-level must be eternal and immutable. Provisional categories that are reliable now, but eventually fail once the sample of confusable alternatives is extended, can nevertheless provide grounding until they are revised. Categories are approximate, and category representation is cumulative and convergent, subsuming prior errors as special cases. This is how I would explain the disambiguation of the case of a horse with painted stripes, for example, as not being a zebra. But the revision of scientific categories such as motion, water and heat are also examples (Harnad 1987b).

4. It would also need connectives such as "and" and "or" and quantifiers such as "some" and "all." Although these functional symbols too could in principle be grounded in instances, they would appear to be prerequisites for detecting and marking conjunctive and disjunctive invariants and expressing propositions containing content symbols. So they would probably have to be available innately in the form of primitive logical and syntactic constraints.

5. The anti-empiricist objections can be summarized as follows: For most categories, necessary and sufficient conditions for category membership, and especially sensory ones, simply do not exist. The evidence for this is that we are not aware of using any, and when we think about what they might be, we can't think of any. In addition, categories are often graded or fuzzy, membership being either a matter of degree (Zadeh 1965) or even uncertain or arbitrary in some cases. Sensory invariants are even less likely to exist: The intersection of all the properties of the sensory projections of the members of the category "good" is surely empty. Moreover, sensory appearances are often deceiving, and rarely if ever decisive: A painted horse that looks just like a zebra is still not a zebra. The roboticist's reply is that introspection is unlikely to reveal the mechanisms underlying our robotic and cognitive capacities, otherwise the empirical task would be much easier. Disjunctive, negative, conditional, relational, polyadic, and even constructive invariants (in which the input must undergo considerable processing to extract the information inherent in it) are just as viable, and sensory-based, as the simple, monadic, conjunctive ones that introspection usually looks for. There are graded categories like "big," in which membership is relative and a matter of degree, but there are also all-or-none categories like "bird," for which invariants exist. There may be cases of "bird" we're not sure about, but we're not answerable to God's omniscience about what's what, only to the consequences of miscategorization insofar as they exist and matter to us. And it's our successful categorization performance that a robotic model must be able to capture -- including our capacity to revise our provisional, approximate category invariants in the face of error. As to goodness, truth and beauty: There is no reason to doubt that -- insofar as they are objective rather than subjective categories -- they too are up there somewhere, firmly grounded in the zebra hierarchy, just as the "peekaboo unicorn" is: The peekaboo unicorn is "a horse with a horn that vanishes without a trace whenever senses or measuring instruments are trained on it." Unverifiable in principle, this category is nevertheless as firmly grounded (and meaningful) as "zebra" -- as long as "horse," "horn," "vanish," "trace," "senses" and "measuring instrument" are grounded. And we could identify its members on first encounter -- if we ever could encounter them -- as surely as we could identify a zebra. The case of the painted horse and of goodness, truth and beauty is left to the reader as an exercise in exploring the recursive possibilities of grounded symbols.