How Can You Trust the Predictions of a Large Machine Learning Model?

Image credit: Depositphotos

Source: Irving Wladawsky-Berger, CogWorld Think Tank member

Artificial intelligence has emerged as the defining technology of our era, as transformative over time as the steam engine, electricity, computers, and the Internet. AI technologies are approaching or surpassing human levels of performance in vision, speech recognition, language translation, and other human domains. Machine learning (ML) advances, like deep learning, have played a central role in AI’s recent achievements, giving computers the ability to be trained by ingesting and analyzing large amounts of data instead of being explicitly programmed.

Deep learning is a powerful statistical technique for classifying patterns using large training data sets and multi-layer AI neural networks. Each artificial neural unit is connected to many other such units, and the links can be statistically strengthened or decreased based on the data used to train the system. But such statistical methods are not equally suitable for all tasks. Tasks that are particularly suitable for machine learning, exhibit several key criteria, such as the availability of large data sets of well-defined input-output pairs for training ML classifiers, - e.g., carefully labeled cat, not-cat pictures for cat recognition classifiers, and english-french document pairs for machine translation algorithms.

The methods behind a machine learning prediction, - subtle adjustments to the numerical weights that interconnect its huge number of artificial neurons, - are very difficult to explain because they’re so different from the methods used by humans. The bigger the training data set, the more accurate the prediction, but the more difficult it will be to provide a detailed, understandable explanation to a human of how the prediction was made.

A few weeks ago I attended an online seminar, - How Can You Trust Machine Learning? by Stanford professor Carlos Guestrin, - on the difficulty of understanding machine learning predictions. Guestrin’s seminar was based on a 2016 article he co-authored with Marco Tulio Ribeiro and Sameer Singh, “Why Should I Trust You?” Explaining the Predictions of Any Classifier.

“Despite widespread adoption, machine learning models remain mostly black boxes,” said the article. “Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.”

“Unfortunately, the important role of humans is an oft-overlooked aspect in the field. Whether humans are directly using machine learning classifiers as tools, or are deploying models within other products, a vital concern remains: if the users do not trust a model or a prediction, they will not use it.”

It’s important to understand the difference between a model, and the various individual predictions such a model can make. A user will trust an individual prediction sufficiently to take some action if the user trusts that the model that made the prediction is behaving in reasonable ways. “Both are directly impacted by how much the human understands a model’s behaviour, as opposed to seeing it as a black box.”

Determining whether a prediction is trustworthy is particularly important when a machine learning model is used in medical diagnosis, sentencing guidelines, terrorism detection, and similar applications that require human judgment. In such cases, the consequences of not understanding the model’s behavior and just acting on its prediction based on blind faith could be very serious.

“Apart from trusting individual predictions, there is also a need to evaluate the model as a whole before deploying it in the wild. To make this decision, users need to be confident that the model will perform well on real-world data, according to the metrics of interest. Currently, models are evaluated using accuracy metrics on an available validation dataset. However, real-world data is often significantly different, and further, the evaluation metric may not be indicative of the product’s goal. Inspecting individual predictions and their explanations is a worthwhile solution, in addition to such metrics. In this case, it is important to aid users by suggesting which instances to inspect, especially for large datasets.”

The paper introduces LIME, - Local Interpretable Model-agnostic Explanations, -  a novel method for explaining the predictions of any ML classifier in order to increase human trust and understanding. “By explaining a prediction, we mean presenting textual or visual artifacts that provide qualitative understanding of the relationship between the instance’s components (e.g. words in text, patches in an image) and the model’s prediction.”

Let me briefly summarize how LIME works.

Using LIME, you can approximate the behavior of any machine learning model, no matter how complex, with a series of simpler local models whose predictions are similar to those of the original model. And you can select the local models by perturbing or varying the inputs, seeing how the predictions change, and then selecting the simpler local models whose predictions are closer to the prediction of the original model. Being considerably simpler, the local models should be understandable and make sense to humans. Thus, while understanding the original machine learning model is a daunting task, it should be much easier to understand the simpler local models whose predictions are similar to the original.

For a detailed explanation of LIME, please watch the actual seminar by Guestrin here.

According to the paper, an effective explanation should exhibit four key characteristics:

  • Interpretable. The explanation must provide a qualitative, comprehensible understanding between the input variables and the prediction, that takes into account the limitations of the target audience. “For example, if hundreds or thousands of features significantly contribute to a prediction, it is not reasonable to expect any user to comprehend why the prediction was made, even if individual weights can be inspected.”

  • Local fidelity. This is LIME’s central concept. “Although it is often impossible for an explanation to be completely faithful unless it is the complete description of the model itself, for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted. … features that are globally important may not be important in the local context, and vice versa.”

  • Model-agnostic. The explanation method should treat the original machine learning model as a black box, and thus be able to explain any existing and future model.

  • Global perspective. Rather than only explaining one prediction, several explanations that are representative of the overall model should be presented to the user.

To make sure that LIME’s explanations are understandable and increase trust, the authors validated the utility of LIME “via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.”


Irving Wladawsky-Berger is a Research Affiliate at MIT's Sloan School of Management and at Cybersecurity at MIT Sloan (CAMS) and Fellow of the Initiative on the Digital Economy, of MIT Connection Science, and of the Stanford Digital Economy Lab.