We would not be comfortable giving emotionally impaired people real-time decision-making authority over our family members’ health, our life savings, our cars or our missile defense systems. Yet we are hurtling in that direction with today’s emotionally impaired AIs.
They — those people, and those AI programs — have trouble doing multi-step abstract reasoning, and that limitation makes them brittle, especially when confronted by unfamiliar, unexpected and unusual situations.
Don’t worry, this is not one of those “Oh, woe is us!” AI fear-mongering articles as we have been graced with by such uniquely qualified AI researchers as Henry Kissinger, Stephen Hawking, and Elon Musk. Yes, we are moving toward a nightmarish AI crisis, but No, it is not unavoidable: there is a clear path out of this devil’s bargain, and I’m going to articulate exactly what it is and how and when it’s going to save us.
Before I can explain that though, I need to say a few more words about today’s AIs.
“I knew, and worked on, machine learning as a Stanford professor in the 1970s, decades before it was a new thing.”
I knew, and worked on, machine learning as a Stanford professor in the 1970s, decades before it was a new thing. Machine learning algorithms have scarcely changed at all, in the last 40 years. But several big things have happened, in that time period that has breathed new life into applying that old AI technology:
(i) Computers are a hundred thousand times faster, and on top of that the video game market has given birth to cheap, fast, parallel GPUs which turned out to be well-matched to the voracious appetites of these AIs
(ii) Data storage costs and transmission speeds have likewise improved by orders of magnitude
(iii) the internet has grown up (well, at least grown), and
(iv) “big data” has gone from scarce to scarcely avoidable. This means there are lots of patches of fertile ground, now, for successfully applying machine learning; I don’t need to survey them here — just try to avoid hearing about them these days.
“Machine Learning has changed much less than, say, the Honda Accord since 1982.”
Current AIs can form and recognize patterns, but they don’t really understand anything. That’s what we humans use our left brain hemispheres for — what Dan Kahneman calls “thinking slow.” That’s the other kind of thinking we do, and that’s also the other kind of AI technology that exists in the world. It involves representing pieces of knowledge explicitly, symbolically, to build a model of (part of) the world, and then doing logical inference step by step to conclude things which can then become the grist for even deeper logical reasoning. Think, e.g., of the Sherlock Holmes character’s dazzling displays of deduction.
For most of this article, I want to talk about symbolic representation and reasoning (SR&R) — the “other AI” besides machine learning. So let’s try to contrast those two types of thinking; those two types of AI.
ML is a form of statistical inference: multi-layer neural networks trained on big data. By contrast, what I’m talking about here is knowledge-based inference. It’s much like the difference between correlation and causation. Here are a handful of examples to illustrate the difference between these two very different types of thinking:
A police officer may statistically profile a person based on his/her appearance and body language (correlation), versus actually investigating and deducing the person’s guilt or innocence (causation).
Until WWII, the “engineering” of large bridges was done mostly by imitating other bridges and just hoping for the best (correlation). Today, we understand the material science of stress, load, elasticity, shear, etc., so mechanical engineering models can be built that prevent tragedies like those that the purely statistical approach led to (e.g., the 1940 collapse of the Tacoma Narrows Bridge) and can go back and analyze what went wrong in those cases (causation).
700 years ago, sometime between Giotto and Brunelleschi, the creation of perspective in paintings went from a mysterious art, only transmitted via years of apprenticeship, to a well-understood technique mechanically created via horizon lines and geometric projections.
For millennia, people observed that if two non-redheads had a red-haired child, then about ¼ of all their children would turn out to be red-headed. Now that we understand genetics, we understand how and why a “rR” carrier of the recessive red-headed gene “r” has zero chance of having red hair themselves but if two carriers have offspring, then half their children will be “rR” carriers and one quarter of their children will actually be “rr” and therefore have red hair.
Amazon or Netflix might strongly recommend Private School because you enjoyed the first two Hannah Kline Mysteries, but your friend — who knows that you just lost a baby, and that that’s an element of Private School– would understand it’s a terrible recommendation for you now.
It may surprise you that both types of reasoning have been harnessed in AIs since the 1970s. Both paradigms looked promising, at first, back then, but then each approach encountered enormous obstacles which stalled their progress for several decades. Several things have changed, in the last 50 years, which have made it cost-effective, finally, to revisit — and harness — both sources of power.
I’ve already described the changes that led to a resurgence of ML applications ((i)-(iv), above). What has changed that leads me to say that the knowledge-based AI solutions approach — what used to be called “expert systems” — is viable, finally?
It turns out that there weren’t four roadblocks and missing technologies, in this case, there were about 150 (in addition to the need for 100,000x faster/cheaper computers and storage, and access to online data). One by one, large-scale engineering efforts have found adequate engineering solutions (not scientific breakthroughs) for all 150! I won’t go through them all, but here are a handful of the more important problems, and for each, a description of the engineering solution that successfully tamed it:
“It turns out that in 1969 there were 150 different roadblocks to knowledge-based expert systems succeeding; one by one each has since been removed by treating it as a large-scale engineering (not scientific) problem to overcome.”
Reusability. Each new “expert system” application had to be built from scratch. And each of those was a long, labor-intensive process, so expert system knowledge engineers inevitably “cut corners” in ways that made their IF/THEN rules almost never reusable in later systems. For instance, one EMYCIN-based system about blood diseases had rules which acted as though all of a patient’s data was obtained on the same day; a different EMYCIN-based system about pulmonary dysfunction needed rules that carefully indicated what measurements were taken exactly when (e.g., tracking the patient’s smoking history over time). Each system performed well, but simply unioning those two rule-sets would have led to horrible errors of commission when trying to get that mash-up to try to perform either application task. The large-scale engineering approach to remediating this problem was to painstakingly identify, collect, and formalize — once and for all, thankfully –the tens of millions of general rules of good guessing and good judgment that comprise human common sense and human expert knowledge in dozens of different application domains. This is a case of making a problem harder to solve it: for the last 35 years that Manhattan-Project-like effort has occupied a team of over a hundred knowledge engineers (whom I dubbed “ontologists” back then) — that’s millions of person-hours of writing and testing and debugging IF/THEN rules. The requirement was that the growing system continue to perform well on all of its past and present domains, plus common sense, and that requirement in turn forced all the rules to be stated in a sufficiently general, domain-independent, and hence reusable form.
Efficiency. Automated logical reasoning (running a set of IF/THEN rules, doing “Resolution” theorem-proving on them) was painfully slow, even when there were only a few hundred rules, and a few hundred “facts” (ground assertions, such as a patient’s medical data). The theory behind this automatic theorem proving was well understood, but in practice (especially with tens of millions of rules and billions of facts) it almost never would have returned answers to questions before the heat death of the universe.
“We could separate the epistemological problem — what should the system know?” — from the heuristic problem — how can the system represent that knowledge in a way that enables inference to happen fast (i.e., fast enough) on it?”
There were two independent large-scale engineering approaches that, working together, finally remediated this problem. The first half of the solution was inspired by the insight that we could separate the epistemological problem — what should the system know? — from the heuristic problem — how can the system represent that knowledge in a way that enables inference to happen fast (i.e., fast enough) on it? While every rule can and should be represented in a nice, clean, logical “epistemological level” language (more on this later, and in my next posting), on which a general theorem prover could operate, it is also possible to redundantly represent the same rule or fact in many ways, each with its own idiosyncratic data structures and algorithms (that operate on those data structures) for doing certain kinds of reasoning super-fast on that, etc. By 1989, we had identified and implemented about 20 such special-case reasoners, each with its own data structures and algorithms. Today there are over 1100 of these “heuristic level reasoning modules.” These work together cooperatively as a sort of community of agents to usurp the need for a general (but hopelessly slow) theorem prover. Some of these stylized reasoning agents are narrowly domain-dependent such as how to efficiently balance a chemical equation, and some agents are very general, such as caching transitive binary relations like during and subOrganizations.
That sped up reasoning, but frustratingly it was still the case that one could speed it up even more by excising portions of the knowledge base — i.e., by removing parts of its brain! This radical surgical approach seems like a step in the wrong direction, whether one is dealing with AI programs or human beings. So why doesn’t our having more knowledge slow us all down, all the time? We don’t become an expert at some task by forgetting everything we know about lots of other topics.
So what happens with humans, as we become an expert at some complicated task? We learn the new domain concepts, rules, and so on, but we also learn new rules of thumb, rules of good guessing, rules of good judgment for how to approach problems in that domain, how to prioritize and so on.
We’ve been able to take that same approach successfully with our symbolic AI reasoners: Whenever the system slows down, we just add more knowledge, more rules, to speed it up. If it’s working in some domain application, we ask the human experts to look over its step by step reasoning trace, to diagnose where it was wasting time. Typically, there was some missing rule of thumb using, that the expert could get to an answer in a few seconds whereas it took the program minutes to deduce the same answer. Adding that meta-level knowledge speeds the program up, incrementally approaching both the correctness, and the efficiency of the best humans who solve that sort of domain problem.
“The largest symbolic representation and reasoning system today spends about 90% of its time working on one or another application domain problem, 9% of its time sitting back and doing meta-level tactical reasoning, and 1% of its time sitting even farther back and metaphorically puffing on its Meerschaum pipe and doing meta-meta-level strategic reasoning.”
In other words, we keep in the system’s knowledge base many meta-level rules that tactically plan and coordinate an attack on the current problem, much like a quarterback does in football. Sometimes we even need to get experts to articulate their meta-meta-rules — strategies — that monitor how the tactician is doing and, like a sideline coach, decide when it’s time to pull the current quarterback from the game and let some other tactician take over. The largest symbolic representation and reasoning system today spends about 90% of its time working on one or another application domain problem, 9% of its time sitting back and doing meta-level tactical reasoning, and 1% of its time sitting even farther back and metaphorically puffing on its Meerschaum pipe and doing meta-meta-level strategic reasoning.
So adding more and more meta-knowledge, then, is the basis of the second way that symbolic AI systems can be massively sped up.
Inconsistency. Rule-based systems did not deal well with the inevitable inconsistencies of rich, real-world information: once an expert system concluded False, bad things inevitably happened., But the real world is full of inconsistency! How can we reconcile this with the need for knowledge bases to be logically consistent if we’re going to use anything like logic to infer new content?To remediate this problem of ubiquitous inconsistency, we had to replace the requirement of global consistency of the knowledge base with the notion of local consistency. Every rule and ground assertion in the knowledge base then has n labels or tags that identify what portion of this n-dimensional knowledge base that rule or assertion holds true in. A rule or assertion might be true at some time, in some place, in someone’s belief system or ideology, up to some level of granularity, etc. etc. Each of those — time, space, level of granularity, etc. — is a dimension of context-space, a dimension of the knowledge base. This explicitly models the context in which the rules’ premises and conclusions are true, and that ripples out to conclude, mechanically and automatically, in what context a final answer can and should be safely assumed to be valid. For instance, the standard set of modern rules of thumb about bridge-building are going to get you into trouble if you’re bridging an active volcano in Hawaii, or you’re bridging a fissure on Venus, or you are a child trying to bridge from your bed to your chair. John McCarthy, Guha, and others working on our team also had to figure out a way for our symbolic AI to reason not by theorem-proving – manipulating rigid “True” and “False” token – but rather by something called argumentation: coming up with all the pro- and con- arguments it possibly can, in each situation, eliminating the self-contradictory ones, and then reasoning about the remaining arguments to decide what to believe in that context. Each context, also sometimes called a “micro-theory,” is a first class object in the system’s ontology of terms, and can be reasoned about just like oil wells and diseases. That enables the symbolic AI to carry out the necessary meta-level reasoning it needs to; reasoning about arguments.
Automatically using “big data” as though it were part of the knowledge base. The general rules in a symbolic representation and reasoning system need to “run on data” – individual patient data, stock data, oil well sensor readings, etc. And most of that data in the world “out there” is in the form of database content or accessible via web services where the meaning of the data is a combination of the data itself plus the meaning of the relations, search fields, etc. A human, or a custom-built application program, interprets the data accordingly; e.g., in one table of one relational database, a cell with the number “48.3” means “the employee represented by this row has an annual salary of USD $48,300.” Often that slightly interpreted data referred to as The human (or custom program) further contextualizes that information: e.g., that entire database table contains information which was true in 2014, or represents what some company’s marketing department today wants potential customers to believe. That multi-step interpretation process needs to happen, somehow, before the results of a symbolic knowledge representation and reasoning system can and should be trusted. I.e., there needs to be some semantic mapping between the terms in a symbolic knowledge representation and reasoning system ontology, and the schema elements in third-party information sources such as databases and web services. Without that, the system is like a human who, no matter how smart they are, is limiting themselves by never accessing the wealth of relevant information available online.
To remediate this in the case of small data (say hundreds of megabytes or less) one can – once the above ontology alignment has been done – simply import 100% of that data into the knowledge base. But in the case of terabytes/petabytes/exabytes of data that approach becomes, respectively, undesirable/unacceptable/ unimaginable. To remediate this problem of big data, the knowledge based AI system can have rules which effectively say “to find out the number of inhabitants of any geopolitical US entity, generate the following type of SQL query, where the table is the NGA-pop table, the relation is POP, etc., and ask that of the following database which can be reached via the following protocol…” In other words, the knowledge based AI system remotely queries relevant third party information sources when/as appropriate just as you or I or a subject matter expert would.
Explanation to end users (and browsing/editing/querying of the KB by end users). The vast majority of end users of these symbolic representation and reasoning AIs won’t want to make the effort to -- and even if they tried wouldn’t be able to -- make heads or tails of some long sequence of IF/THEN-rule-firings, especially if those rules are written in some sort of logical language. But this functionality — explanation of the system’s line of reasoning that led it to an answer — can’t be omitted: it is exactly that step-by-step reasoning chain which users need to audit, and therefore trust, the system. In cases where the user disagrees with the system’s reasoning, if he or she can follow the line of reasoning then he or she is easily able to offer feedback and provide his or her own knowledge to override and improve the system (at least in that context or any context in which that user is trusted).So, for multiple reasons, it is imperative that each long rule trace of formal rule-firings can be automatically converted, somehow, into a terse, readable, understandable explanation, ideally in some natural language like English. So how is the remediation of this coming? Well, there is bad news and good news. The bad news: Unfortunately, open-ended unrestricted NLU (complete automatic translation of a natural language text into a formal representation language, without throwing away a lot of the meaning) is still years away from being a reality – the current state of the art is to recognize entities in text, recognize sentiment, recognize very simple binary relations (often with important modifiers like “not” missed!), and notice degrees of co-occurrence and frequency of word combinations. In a typical English paragraph, this throws out about 90% of the baby – the meaning of the text – with the bathwater.
“Unfortunately, open-ended unrestricted NLU (complete automatic translation of a natural language text into a formal representation language, without throwing away a lot of the meaning) is still years away from being a reality…”
“… but for NLG (natural language generation), a surprisingly simple compositional recursive algorithm succeeds quite well.”
The good news: Fortunately, what’s needed to remediate the Explanation problem is not NLU but just NLG (natural language generation), and for that surprisingly simple compositional recursive algorithm succeeds quite well. E.g., the logical expression (biologicalMother X Y) can be translated this way into English as “Y is the biological mother of X” where X and Y are, recursively, the translations of the expressions X and Y. For example, the nested expression (biologicalMother MaryAnneMcLeod (winnerOfIn USPresidentialElection 2016)) turns into “Mary Anne McLead is the biological mother of the winner of the 2016 US Presidential Election,” which is a bit stilted but fully understandable by an English speaker unfamiliar with formal logic. This also forms the heart of an interface whereby such individuals can query, browse, and edit the knowledge base.
The small residue of cases where this compositional approach fails – commonly occurring cases that lead to confusing or bizarre English sentences being generated – can be handled by idiosyncratic rules that generate natural-sounding glosses for those logical expressions.
This simple compositional approach to NLG also performs poorly on very long sentences that can be dozens of words long. One way to remediate this is to automatically break them into a set of smaller logical pieces – the nested components of the compound logical expression – which short logical expressions are then translated into short natural language sentences one at a time. This approach works but generally leads to translations where a single long sentence gets turned into a series of several short sentences that sound a bit like My First Reader but are nevertheless both understandable and complete (i.e., do not omit any of the intended content which is present in the logical form of the representation.)
Next time: The other half of the story. Everything I’ve discussed so far is only half of my argument about when and how we will have AIs with functioning left brain hemispheres, AIs which are not brittle in the face of novel situations. In my next posting, I will go through the other half of the argument, the teaser for which is this:
Some of the best AI systems today do have and make heavy use of some sort of symbolic representation and reasoning engine, but the representations of knowledge that they use (triple stores, RDF/OWL ontologies, knowledge graphs, etc.) are much too shallow. They make those choices for efficiency reasoning, but the result is a lot like the joke “We’re lost, but we’re making good time!” Researchers and application builders tolerate their AI systems having just the thinnest veneer of intelligence, and that may be adequate for fast internet searching or party conversation or New York Times op-ed pieces, but that simple representation leads to inferences and answers which fall far short of the levels of competence and insight and adaptability that expert humans routinely achieve at complicated tasks, and leads to shallow explanations and justifications of those answers.
There is a way out of that trap, though it’s not pleasant or elegant or easy. The solution is not a machine-learning-like “free lunch” or one clap-of-thunder insight about a clever algorithm: it requires a lot of hard work just like all 5 of the bottleneck remediations I have discussed above, hard work involving higher order (e.g., modal) logics, writing down the formal statements in that language that capture the pragmatics of the real world (and, if we want to reason about it, the Marvel universe and other fictional worlds), and getting serious about pro- and con- argumentation. The path is uphill and long, but it’s there, and it’s clear, and we can already see the first signs of successfully traversing it: Yes, there are finally some AIs – AIs you’ve probably not heard about yet – on earth today that truly understand.
 A few tweaks have been made, such as increasing the number of hidden neural net layers, convolution, and rectified linear activation, but overall ML has changed much less than, say, the Honda Accord since 1982.
 which are actually something logicians call “abduction,” but let’s not worry about that yet.
 AI researchers started out forty years ago with object/attribute/value triples -- much like today’s knowledge graphs –, but it turned out to require more and more expressive logics to represent the full meaning of utterances and writings as tersely as they can be expressed in a natural language such as English. I’ll discuss this more in my next posting.
 This is just another instance of W. Pascal’s well-known observation: “In theory, there is no difference between theory and practice. But, in practice, there is.”
 Think of what happens in algebra when you accidentally divide zero, or Tevye’s grappling with contradiction in Fiddler on the Roof, or almost any episode of Star Trek where a computer is inconsistent.
 A good analogy is how we all know that the surface of the earth is roughly spherical, but we live our everyday lives as though it were flat, and that works well for us almost all the time because it is locally flat. In much the same way, we can organize our symbolic AIs knowledge base into a multidimensional context space, with nearby contexts being mostly consistent with each other. As inference proceeds, it reaches farther- and farther-flung contexts, and the inevitable contradictions that are encountered are treated just as a sign to stop reasoning in that “direction.” All symbolic reasoners are resource-limited, so this is just a hint for it to “search elsewhere!"
"Doug Lenat, Ph.D., contributor, is a pioneer in artificial intelligence. Dr. Lenat is founder of the long-standing Cyc project and CEO of Cycorp, a provider of semantic technologies for unprecedented common sense reasoning. He has been a Professor of Computer Science at Carnegie-Mellon and Stanford and has received numerous honors, including the bi-annual IJCAI Computers and Thought Award (the highest honor in Artificial Intelligence). Dr. Lenat is a founder and Advisory Board member of TTI Vanguard, where he continues to co-run four conferences each year. Doug also holds the distinction of being the only individual to have served on the Scientific Advisory Boards of both Microsoft and Apple.