***





The Embodied Intelligent
Elephant in the Room

Dr. Saty Raghavachary, CS Dept, saty@usc.edu

The body MATTERS!

Intelligence is not 'all in the head'...

The brain's purpose is to help/sustain the body - to survive, reproduce... in nature, there are no brains without bodies.

Rodney Brooks: 'Elephants Don't Play Chess'! ['... it is unfair to claim that an elephant has no intelligence worth studying just because it does not play chess.']

Embodied cognition is not a new topic - '4E' (embodied, embedded, extended and enactive) cognition was introduced in 2010, and just the embodied aspect dates back to the 1990s.

This talk aims to explain how embodiment is integral to robust (AG) intelligence, using the twin notions of:

Phenomena

The physical/material world is full of phenomena - a ball rolling, drum vibrating, water boiling, light rays refracting... These are universal, and likely include ones that humans are not even aware of.

Structures (form/arrangement...) give rise to phenomena (behavior) - this is indeed the fundamental premise of materials science.

Structures and their associated phenomena involve interplay (self-organization, transformation, transport etc.) between matter, energy, information.

We sense, and perceive, via phenomena

Be it touch, taste, smell, hearing or vision, our senses function on account of phenomena that our own bodily structures (cornea, ear drum...) give rise to (light focusing, vibrating...).

Perception likewise involves complex phenomena in the brain that involve neuronal conduction, modulation via neurotransmitters, oscillations, long-range waves travelling across brain regions, etc.

We stay alive on account of phenomena

The heart beating, lungs exchanging oxygen and CO2, kidney function, balancing, standing, heat regulation and more... are all governed by appropiate material phenomena that occur in our bodies.

'Life' can be regarded as a collection of structures which mutually support each other, via associated phenomena, to counter entropy increase (ie. to create and maintain order).

We interact with ('experience') the world via phenomena

'Experience' is the term we use, to refer to first-hand (direct) interaction with the environment, in terms of phenomena - eg. glass breaking, wind blowing, dog growling...

We REPRESENT phenomena in our brains

The phenomena we experience (verb) are 'stored' (represented in the brain) as experiences (noun), in terms of object properties/qualities/features - eg. a paper bag rustles, juicy apples taste sweet, liquids are easily spillable, etc. These memories (that can change over time) help us negotiate the environment.

We do NOT learn object qualities indirectly ('second hand') solely in terms of data, rules, or reinforcement - direct 'experience' makes this unnecessary!

Learning about (experiencing, representing, recalling, reasoning about) the world in terms of direct physical experience of object qualities is what lets us analogize, abstract, group, transfer, compose, invent... the meaning of things! Note that these are the very aspects that have continued to confound/elude AI from its inception, right up to the present.

The above is also related to Gibson's as well as Norman's theories of 'affordances' (possibilities for action that an object provides an agent), and to von Uexküll’s theory of meaning, ie. 'Umwelt' (where an agent's body+brain architecture is what results in its unique experience in a shared environment), on account of reciprocity between an embodied being and its environment.

AI without physical embodiment

An AI with no physical embodiment, that is made to learn via rules ('symbolic'), data ('connectionist') or goals ('reinforcement'), could be regarded as functioning in a derivative (of the real) world!

In such a derivative world, we humans are the intermediary between the environment and the agent. By our specifying the world for the AI (via OUR rules/data/goals, in terms of OUR representations), we are in effect, making them live in a 'simulation' of the phenomena-rich, structure-based, physical world!

Our representations (for AI) might even be incomplete, incorrect, inadequate/simplistic - compared to their real life counterparts. Further, such representations necessitate computation (eg. to label an incoming image) - but biological intelligence might not be based on computation (as Rodney Brooks speculated in IEEE Spectrum, Oct'21 ).

THIS (operating in a derivative, computation-based environment) is the root cause for AI's deep/narrow wins that have never been able to be generalized in any useful way.

Simpler animals vs humans vs AI

The problem?

So we employ rules/data/goals that enable AI - what is the issue with that?

The problem: lack of 'grounded' meaning! In a derivative world (eg. of an AI that has been trained via a massive text corpus), an agent does not interact directly, physically, continuously with the world! Rather, what the AI 'knows' is second-hand at best, since it lacks the agency/means to learn via direct interaction.

A large language model (LLM) trained this way, for example, ends up learning language features (word ordering, sentence structure etc.) rather than what the words actually mean. Phrases such as "looking up to somebody", "hit it out of the park", "as clear as day and night" etc., along with the nouns, verbs, adjectives and adverbs that reference real-world entities and actions, inherently mean nothing to such an AI. Reason: meaning does not lie in a word, phrase or sentence structure; rather, it is obtained from directly and physically experiencing what the words denote. This is made glaringly obvious by the new crop of text-to-image generators, eg. as seen here.

The solution!

Construct an agent with a suitable physical (not even virtual) body+brain architecture so that it directly learns from its own interactions with the world - there is no intermediary, specifically, us. The architecture should be one that enables direct representation of world features that would be useful to the agent, no matter how incomplete/incorrect the representation might be.

Such physically embodied agents will be able to acquire the Holy Grail - grounded meaning!

The agents do not need to be anthropomorphic, human-level, they could be simpler - such architectures would lead to simple forms of intelligence, which would still be grounded (similar to that of insects).

Rodney Brooks' subsumption architecture was an attempt in this direction, as is the emerging field of developmental robotics.

Human level embodied agents will require advanced architecture that incorporates pliable, sensor-laden skin, brain plasticity that permits modification, autopoiesis, homestasis, self-preservation instincts, etc. - not unlike how we humans are constructed.

So what does embodiment provide, again?

Our embodied evolution

Factors that have contributed to humankind's collective progress [via an appropriate (sufficiently complex) body+brain architecture]:

An agent that is unembodied or is virtually embodied, has no means to explore, experience, feel, discover, invent, anticipate/predict behaviors in the world the way an embodied agent would - there is no spatial or temporal scale to use as reference, no richness and multitude of phenomena, no irreversible passage of time, no inherent urge to understand the world and survive in it, etc.

Analog or digital?

The world's structures and associated phenomena, are analog in nature, not digital in the sense of a von Neumann stored-program architecture (separation of memory and processing, with fetch-decode-execute compute cycles).

Given the above, an agent with an analog brain architecture would seem to be a better 'match' for autopoietic functioning in the world, as opposed to needing allopoietic human involvement; such an agent would also be able to actually experience phenomena, as opposed to carrying out digital computation of it.

Analog, symbol-free computation

Analog computing architectures do not make use of **explicit** symbolic computation - eg. consider a wind-up kitchen timer, Rube-Goldberg device [eg. this], Braitenberg vehicle, sandglass. In all these, the mechanism is the computer - where there is no explicit generation or storage or manipulation of symbols (including numbers); yet, the system displays useful/desirable (intelligent) behavior. How? Via its design.

Since symbols are human-originated artifacts, computing via them is necessarily 'second hand', ie derivative. Provocatively speaking, they are ALL just glorified/fancy calculators!

An analog brain architecture - on the other hand, coupled with a suitable body, would engage with the world directly in terms of physical structures and their non-symbolic phenomena, rather than in terms of data structures and their symbolic computation.

Conclusions

Embodiment is what permits the acquisition and representation of 'experience' - direct, physical, continuous and interactive exposure to structures and phenomena.

Experience provides an agent, direct grounded meaning, rather than its human-originated, interpreted/abstracted/simplified substitute.

Posessing an analog brain better situates an agent in the world, compared to a digital one that replaces natural phenomena with computation of it, in order to perceive and represent the world, ie. in order to experience.

Bottom line: embodiment 'matters' (it IS the big elephant in the room)! Cognition isn't merely cerebral, and isn't digital-computational either.