Can an AI System Exhibit Commonsense Intelligence?

April 15, 2023 Irving Wladawsky-Berger

Image: Depositphotos

Source: Irving Wladawski-Berger, CogWorld think tank member

“One of the fundamental limitations of AI can be characterized as its lack of commonsense intelligence: the ability to reason intuitively about everyday situations and events, which requires rich background knowledge about how the physical and social world works,” wrote University of Washington professor Yejin Choi in “The Curious Case of Commonsense Intelligence,” an essay published in the Spring 2022 issue of Dædalus. “Trivial for humans, acquiring commonsense intelligence has been considered a nearly impossible goal in AI, added Choi.”

What is commonsense intelligence? In a 2020 presentation, Choi defined common sense as “the basic level of practical knowledge and reasoning concerning everyday situations and events that are commonly shared among most people.” Common sense is essential for humans “to live and interact with each other in a reasonable and safe way,” and for an AI system, common sense is essential “to understand humans needs and actions better.” She further explained the difference between intuitive commonsense reasoning and analytical rational reasoning by discussing the pioneering research of Daniel Kahneman, — Princeton Professor Emeritus and recipient of the 2002 Nobel Prize in Economics, — and his long time collaborator Amos Tversky, who died in 1996.

In his excellent 2011 bestseller Thinking, Fast and Slow, Kahneman explained that our mind is composed of two very different systems of thinking, System 1 and System 2. System 1 is the intuitive, unconscious, fast, and effortless part of the mind. Thoughts come automatically and very quickly to System 1 without us doing anything to make them happen.

System 1 typically works by developing a coherent story based on its perceptions of what’s going on around us, filling the gaps using its vast amounts of common sense knowledge on how the physical world works and how people generally behave. Our minds are constantly developing intuitive stories whenever we perceive an event, including what caused the event, what will happen afterwards, as well the motivations and emotional states of anyone involved in the event. Such stories help us deal efficiently with the myriads of simple situations we encounter in everyday life. But, while enabling us to act quickly, the simple, coherent stories System 1 comes up with can be wrong and lead to mistakes.

System 2 is the slower, logical, deliberate, and effortful part of the mind. It’s where we evaluate and choose between multiple options. But it’s also lazy and tires easily, so we don’t generally invoke System 2 unless rigorous, demanding rational thinking is necessary for activities like solving puzzles, reading and writing articles, solving math problems, or taking a test.

Common sense is shaped by evolutionary biology and social context. We’re born with the ability to quickly learn from and adapt to the social environment around us. In “The Ultimate Learning Machines,” a 2019 WSJessay, UC Berkeley psychologist Alison Gopney noted that whereas training deep learning algorithms to recognize cats and dogs requires huge numbers of labeled pictures, young children “can learn new categories from just a small number of examples. A few storybook pictures can teach them not only about cats and dogs but jaguars and rhinos and unicorns.”

“One of the secrets of children’s learning is that they construct models or theories of the world,” she added. “Toddlers may not learn how to play chess, but they develop common-sense ideas about physics … even 1-year-old babies know a lot about objects: they are surprised if they see a toy car hover in midair or pass through a wall, even if they’ve never seen the car or the wall before.” A major grand challenge in AI is how to build systems that can think, learn, and understand how the world works as resourcefully as an 18-month old.

AI research in the 1960s, 70s, and 80s focused on developing symbolic representations of the world with knowledge frameworks like semantic networks and logic-based programming languages like LISP and Prolog for developing reasoning systems. “But despite their intellectual appeal, logic-based formalisms proved too brittle to scale beyond experimental toy problems,” wrote Choi in her Dædalus essay.

Formal logic methods are most appropriate for problems whose solution can be arrived at from their initial premises through deductive inferences, like proving a mathematical theorem. But intuitive, commonsense reasoning is quite different. “The purpose of intuitive reasoning is to anticipate and predict what might be plausible explanations for our partial observations, so we can read between the lines in text and see beyond the frame of the image,” explained Choi. Moreover, intuitive reasoning draws from our commonsense knowledge about the world to fill in the blanks, and is thus defeasible, that is, as we better understand the actual context of the situation, the correct explanations might be quite different from the ones our intuition originally arrived at.

Scale is another major reason why formal logic fails when applied to intuitive, commonsense reasoning. “The reasoning framework, to be practically useful, should be ready to cover the full spectrum of concepts and compositions of concepts that we encounter in our everyday physical and social interactions with the world. In addition, the real world is filled with previously unseen situations, which require creative generation of hypotheses, novel compositions of concepts, and novel discovery of reasoning rules.”

Choi added that “language-based formalisms despite their apparent imprecision and variability, are sufficiently expressive and robust to encompass the vast number of commonsense facts and rules about how the world works. After all, it is language, not logical forms, through which humans acquire knowledge about the world.”

Foundation Models, like GPT-3, are one such language-based formalism. While based on deep leaning (DL) technologies, these large language models (LLMs) have gotten around previous DL limitations by leveraging two recent advances, huge scale and transfer learning. Foundation models are trained with over 10X more data than previous DL models, including big chunks of the information on the internet as well as digital books, articles, reports, and other digital media. And unlike the task-specific training of earlier AI systems, transfer learning takes the knowledge learned from training one task and applies it to different but related tasks.

Shortly after GPT-3 went online in 2020, its creators at the AI research company OpenAI discovered that not only could GPT-3 generate whole sentences and paragraphs in English in a variety of styles, but it had developed surprising skills at writing computer software even though the training data was focused on the English language, not on examples of computer code. But, as it turned out, the vast amounts of data used in its training included many examples of computer programming accompanied by descriptions of what the code was designed to do, thus enabling GPT-3 to teach itself how to program. Similarly, GPT-3 taught itself a number of other tasks like generating legal documents.

Choi leads Mosaic, a project at the Allen Institute that’s building a language-based prototype of a commonsense knowledge and intuitive reasoning system. The prototype couldn’t be based on existing large language models like GPT-3 because such models are typically trained to generate the next English word, sentence or paragraph sequentially from left to right, a technique that doesn’t work for common sense models.

GPT-3 works remarkably well for generating sequential English text on many topics and styles in answer to a question or a prompt. But everyday human cognition is far from sequential and requires flexible reasoning over events that may not have occurred sequentially, such as counterfactual reasoning, — which involves considering possible alternatives to an event that has already occurred, and abductive reasoning, — a kind of reasoning which seeks the simplest, most likely conclusion to an observation.

“Although most of our day-to-day reasoning is a form of abductive reasoning, it is relatively less known to most people. For example, Arthur Conan Doyle, the author of the Sherlock Holmes canon, mistakenly wrote that Sherlock used deductive reasoning to solve his cases. On the contrary, the key to solving Holmes’s mysteries was almost always abductive reasoning, which requires a nontrivial dose of imagination and causal reasoning to generate explanatory hypotheses that may not seem obvious to others.”

As part of their research on commonsense AI, Choi and her colleagues have developed a number of innovative language-based systems like ATOMIC, a collection of textual description of commonsense rules and facts about everyday objects and events, and new inference algorithms that can flexibly incorporate the non-sequential nature of intuitive reasoning.

“While the research highlighted in this essay demonstrates potential new paths forward, we are far from solving commonsense AI,” wrote Choi in conclusion. “Numerous open research questions remain, including computational mechanisms to ensure consistency and interpretability of commonsense knowledge and reasoning, deep representational integration between language and perception for multimodal reasoning, new learning paradigms for abstraction and analogies, and advanced learning methods for interactive and lifelong learning of knowledge and reasoning.”

Irving Wladawsky-Berger is a Research Affiliate at MIT's Sloan School of Management and at Cybersecurity at MIT Sloan (CAMS) and Fellow of the Initiative on the Digital Economy, of MIT Connection Science, and of the Stanford Digital Economy Lab.