Kodratoff and NN/SYM

Yves.Kodratoff@lri.fr

I have been surprised that some arguments relative to the state-of-the-art in artificial intelligence (AI) have never been used (only Roitblatt hints at this kind of issue). Here is a thorough reply to Harnad and to Searle.
I will start with a first meta-argument. You base all of your discussion around Searle's Chinese room (Searle, 1984) because you all agree it is a very serious argument, upon which I disagree. You all argue about what TT should be without considering the results obtained by the science which is trying to build a real TT.

I read it quite a few years ago, but as far as I remember, in his text Turing is quite vague about what kind of tests the computer should pass. He just insists on two things, first that the computer should be hidden in some way to avoid biasing (hence, the symbolic communication) and that it should be confused with a human by a human.

I don't remember that Turing ever forbade the computer to have robotics abilities, he also never forbade to try tricking the machine. For me, passing TT is answering by symbols to a tricky observer who really pushes its (or his or her) opponent at its limits. This is necessary to characterize a real TT, not a soft TT in which nobody will ever try to trick the machine. Now, my point is that, facing such a tricky observer, the machine will have to be a complex robot, to be able to adapt well, and to explain correctly its behavior.

Hence, at once for me a TTT is just a sort of TT in which one more constraint is made explicit. To come back to the Chinese room, suppose that my tricky observer (it has to be someone external to the game with the computers to stick to strict TT definition) puts in the baskets of the Chinese room any unexpected disgusting thing, the machine should complain about it, otherwise it is caught as a non-human. This is exactly why Harnad introduces his TTT (he speaks rather of a Buddha, but the principle is the same: the human faking a computer will have to see the garbage among the Chinese signs, or to miss it. Notice that a robot submitted to TT would have to be able to make the difference among, say, Chinese signs, garbage, Buddha statues, Christian crosses etc., each demanding a different symbolic reaction). No need to introduce a TTT, it is already implied by a hard TT. In passing, there exist some mixed architecture like the one Harnad proposes, they are far from instantiating Harnad's TTT (specifically: NNs have been a definitive progress in Vision, but they don't solve the whole problem, even when coupled with a more symbolic system).

I am used to another example than the Chinese room, it will be useful again when coming to learning, I call it the simultaneous actions argument. Suppose that the observer sends the following (symbolic) message to the machine: John has been driving from JFK to RPI, all the time he has been scratching his head. The machine should emit the equivalent of some non committing grunts to look human. Now, if the observer sends John has been driving from JFK to RPI, all the time he has been scratching his toes, the machine should protest about the validity of this sentence to avoid to be caught at lacking humanity. Think of the infinite number of things you cannot do while driving, or while swimming butterfly etc. (think that you can indeed scratch briefly your toes when driving if you are supple and unwise, you cannot forbid it in all cases. On the contrary, scratching one's toes while swimming butterfly implies stopping swimming at all, etc.). Impossible to store all that (1).

The machine thus needs the ability to drive to be able to simulate driving, driving while scratching one's toes, testing the lack of comfort and security that it implies. As some claim in the review, a virtual reality is quite enough, but confess that even for simple sentences as above, you need a quite extensive one (see also infra).
This need accords perfectly with AI results (and partial failures), since everyone knows that non-simulated robotics, vision, and audition are far from simple problems, and that we have no existing AI system that handles correctly these problems. On the contrary, we have systems that reason on symbols fairly well. Thus, it is not surprising that asking for robotics abilities is to be seen as a normal request for intelligence.

Another argument about which I start following Harnad is that one should make use of grounded symbols. However, I must insist that the AI community recognized very long time ago the necessity for grounding its symbols (or, at least to fake grounding them), it calls that including semantics in its programs. For instance, as Roitblatt hints at, nobody believes in the AI community that it will be ever possible to understand a language without grounding its symbols. We call that using the semantics of the domain, is that so terribly far from what philosophers understand by semantics? All that is so well-known that I am reluctant to insist, but Harnad's dry answer to Roitblatt's quite sensible arguments makes it necessary. Let us consider the above driving example. The symbols driving and scratching will be really grounded if the robot is capable to execute (or to simulate execution) of the concepts, thus being able to envisage all possible consequences of the situation. The AI approach is to associate what it really means (i.e., the semantics of) to be driving or scratching, that are used as constraints on the execution of the action. For instance, for driving its semantics could contain seated in a moving car, holding the steering-wheel, etc. On its side, scratching, in the meaning I use here, can be performed with nails slightly in contact with skin, etc. Now, when you are driving and scratching, both sets of constraints have to be true, and of course the constraints associated to the semantics of the concepts included in it, like holding (one normally holds with the hands, etc.), like nails (they are part of the hand, etc.). The bet of AI is that it is possible to represent everything of significance in that way, so that a concept is true when it generates no conflicting constraints, and false when it does. I think that it is quite enough for many industrial applications, but that it caves in when we come to passing TT with a nasty observer. It caves in because it must go into such a level of details in the semantics it associates to each concept, in each context, that the amount of memory involved becomes enormous.

To conclude this first part, I kind of agree with Harnad that a pure symbolic system is not enough to pass TT, but not to avoid falling in the trap of the Chinese room. The real reason is an intrinsic limit to the simplification brought by symbolism. When it starts making the matters more complex, obviously we should stop using symbols.

My last point is that, by considering AI instead of philosophy only, you could have realized that two other extensions were needed, the one of adaptability, and the one of explicability.

From the very beginning of AI, it has been argued that in very simple problems the number of possible squiggles is far above the number of particles in the universe, which makes it difficult to store all them in advance in Searle's notebook. Notice this is a different combinatory explosion than the one I spoke about for the need of robotics. Robotics are needed because of the complexity of the possible situations in the real world. Now, we consider the variations around one possible situation met by our robot, and the need to learn how to handle them. Of course, there is also a need to learn driving, we did not speak of it before. That is the meaning of adaptability: the ability to meet successfully unknown situations, really unknown, which ask for simple but genuine creativity. To use again the above example, I guess you never in your life drove scratching your toes, but you know it is impossible to drive 3 hours doing so. One of the cornerstones of AI relative failure in robotics is exactly that we are not good at programming adaptability. Once more, NNs improved a bit on the situation, but nobody can say that they solved the problem. Hence I propose an ATTT, which is just one more new precision added to TT.

Just after the first expert systems have been born, it has been recognized that explanability is a necessary component of a thinking machine. The first reason is linked to the use of expert systems for teaching (Clancey, 1983; a more tutorial paper is Swartout and Moore, 1993). Yet another reason is that the best expert system that gives you bad explanations of its behavior is simply useless because you cannot maintain it properly. Only the original writer (now, up in the hierarchy, usually) is able to maintain. These reasons are technical, not that humans explain themselves quite well. Still, this is necessary to fake humanity. Suppose that Searle's machine is being asked why it gave a particular answer, it would say because of such squiggles in such signs, and because my book told me to do this and that. At once, the observer will recognize a non human answer and the machine would fail TT. This is why I can go on joking by proposing to introduce an ATTTE, which is nothing but one more precise TT! Imagine what is needed to build a machine that explains everything correctly, especially if you want explanations about its behavior in unexpected situations. For instance, after recognizing a misshapen Chinese letter you ask why did the machine recognize it, it should be able to give answers at all levels of understanding: the signs, the words, the sentences, the context of the sentence, the goals of the writer and of the reader etc. (not all together, otherwise, it would fail TT on the ground of machine fussiness!).
Yet another interesting information relative to explanations in the symbolic/neural debate is that some people at CMU built such a mixed system. The NN was used to take decisions, and the symbolic part was for providing explanations. The explanations were adapted, that is to say bent to fit the answer of the NN (Knight and Gil, 1991). The symbolic system was thus a sophist able to explain a result, but also its contrary. This interesting experiment was acknowledged with much hostility by the AI community because of its sophistry. It illustrates well however one way for the neural and the symbolic to share their work.

Now, on the top of everything, this ATTTE of mine is far from being in existence. To me it is as real as a unicorn (grounded by one horn etc.). Has this unicorn blue symbolic eyes or green neural ones? Yet another unexpected strength of good olden TT is that a machine able to adapt so perfectly, to explain so well, will be too clever and too pompous to be really humane! It may fail TT on these grounds. I am sure that new developments of AI will bring to the fore unexpected problems, and that their solution (towards which AI always tends without solving them completely) will have to be included as a new feature, expected from a machine to pass TT.

References

Turing A. M.: Computing Machinery and Intelligence, Mind 59, 433- 460, 1950.
Searle J. R.: Minds, brains & science, Penguin books, London 1984.
Clancey W.: The Epistemology of a rule-based expert system: A framework for explanation, Artificial Intelligence 20, 215-251, 1983.
Swartout W. R., Moore J. D.: Explanations in second generation expert systems, in Second generation expert systems, David J. M., Krivine J. P., Simmons R. (Eds.), Springer-Verlag, Berlin, 1993.
Knight K., Gil Y.: Automated Rationalization, Proc. First International Workshop on Multistrategy Learning, pp. 281-288, published by AI center, GMU, Fairfax VA, 1991.

(1) To give a crude evaluation, suppose that 10**4 action verbs are possible and are applied to 10**4 places (this is a gross underevaluation since one can scratch one's toes, but also a space shuttle, a whale's back (stuffed in a museum, swimming, or dying on a beach?), etc.). This gives us already 10**8 actions in a place, that can be combined two by two, three by three etc. Let us say that all that leads us to some 10**20 possibilities, each of them with comments to express their exact possibility, the contexts in which they may occur, and some other things I could imagine. It means that we do not need only 10**20 bits of memory, but a data base containing some 10**20 entries!