Credito:Pixabay/CC0 di dominio pubblico
Circa un decennio fa, i modelli di deep learning hanno iniziato a ottenere risultati sovrumani su tutti i tipi di compiti, dal battere i giocatori di giochi da tavolo campioni del mondo a superare i medici nella diagnosi del cancro al seno.
Questi potenti modelli di deep learning sono generalmente basati su reti neurali artificiali, che furono proposte per la prima volta negli anni '40 e sono diventate un tipo popolare di machine learning. Un computer impara a elaborare i dati utilizzando strati di nodi interconnessi, o neuroni, che imitano il cervello umano.
Con la crescita del campo dell'apprendimento automatico, le reti neurali artificiali sono cresciute insieme ad esso.
I modelli di deep learning ora sono spesso composti da milioni o miliardi di nodi interconnessi in molti livelli addestrati per eseguire attività di rilevamento o classificazione utilizzando grandi quantità di dati. Ma poiché i modelli sono così enormemente complessi, anche i ricercatori che li progettano non capiscono appieno come funzionano. Ciò rende difficile sapere se funzionano correttamente.
Ad esempio, forse un modello progettato per aiutare i medici a diagnosticare i pazienti ha predetto correttamente che una lesione cutanea era cancerosa, ma lo ha fatto concentrandosi su un segno non correlato che si verifica frequentemente quando c'è tessuto canceroso in una foto, piuttosto che sul cancro tessuto stesso. Questo è noto come una correlazione spuria. Il modello ottiene la previsione corretta, ma lo fa per il motivo sbagliato. In un contesto clinico reale in cui il segno non appare sulle immagini positive al cancro, potrebbe causare diagnosi mancate.
Con così tanta incertezza che gira intorno a questi cosiddetti modelli "scatola nera", come si può svelare cosa sta succedendo all'interno della scatola?
Questo puzzle ha portato a una nuova area di studio in rapida crescita in cui i ricercatori sviluppano e testano metodi di spiegazione (chiamati anche metodi di interpretabilità) che cercano di fare luce su come i modelli di apprendimento automatico a scatola nera fanno previsioni.
Cosa sono i metodi esplicativi?
Al loro livello più elementare, i metodi di spiegazione sono globali o locali. Un metodo di spiegazione locale si concentra sulla spiegazione del modo in cui il modello ha effettuato una previsione specifica, mentre le spiegazioni globali cercano di descrivere il comportamento generale di un intero modello. Questo viene spesso fatto sviluppando un modello separato, più semplice (e si spera comprensibile) che imita il modello più grande della scatola nera.
Ma poiché i modelli di deep learning funzionano in modi fondamentalmente complessi e non lineari, lo sviluppo di un modello di spiegazione globale efficace è particolarmente impegnativo. Ciò ha portato i ricercatori a concentrare gran parte della loro recente attenzione sui metodi di spiegazione locale, spiega Yilun Zhou, uno studente laureato nell'Interactive Robotics Group del Computer Science and Artificial Intelligence Laboratory (CSAIL) che studia modelli, algoritmi e valutazioni in interpretabili apprendimento automatico.
I tipi più popolari di metodi di spiegazione locale rientrano in tre grandi categorie.
Il primo e più diffuso metodo di spiegazione è noto come attribuzione di caratteristiche. I metodi di attribuzione delle caratteristiche mostrano quali caratteristiche erano più importanti quando il modello ha preso una decisione specifica.
Le funzionalità sono le variabili di input che vengono fornite a un modello di apprendimento automatico e utilizzate nella sua previsione. Quando i dati sono tabulari, le caratteristiche vengono tratte dalle colonne in un set di dati (vengono trasformate utilizzando una varietà di tecniche in modo che il modello possa elaborare i dati grezzi). For image-processing tasks, on the other hand, every pixel in an image is a feature. If a model predicts that an X-ray image shows cancer, for instance, the feature attribution method would highlight the pixels in that specific X-ray that were most important for the model's prediction.
Essentially, feature attribution methods show what the model pays the most attention to when it makes a prediction.
"Using this feature attribution explanation, you can check to see whether a spurious correlation is a concern. For instance, it will show if the pixels in a watermark are highlighted or if the pixels in an actual tumor are highlighted," says Zhou.
A second type of explanation method is known as a counterfactual explanation. Given an input and a model's prediction, these methods show how to change that input so it falls into another class. For instance, if a machine-learning model predicts that a borrower would be denied a loan, the counterfactual explanation shows what factors need to change so her loan application is accepted. Perhaps her credit score or income, both features used in the model's prediction, need to be higher for her to be approved.
"The good thing about this explanation method is it tells you exactly how you need to change the input to flip the decision, which could have practical usage. For someone who is applying for a mortgage and didn't get it, this explanation would tell them what they need to do to achieve their desired outcome," he says.
The third category of explanation methods are known as sample importance explanations. Unlike the others, this method requires access to the data that were used to train the model.
A sample importance explanation will show which training sample a model relied on most when it made a specific prediction; ideally, this is the most similar sample to the input data. This type of explanation is particularly useful if one observes a seemingly irrational prediction. There may have been a data entry error that affected a particular sample that was used to train the model. With this knowledge, one could fix that sample and retrain the model to improve its accuracy.
How are explanation methods used?
One motivation for developing these explanations is to perform quality assurance and debug the model. With more understanding of how features impact a model's decision, for instance, one could identify that a model is working incorrectly and intervene to fix the problem, or toss the model out and start over.
Another, more recent, area of research is exploring the use of machine-learning models to discover scientific patterns that humans haven't uncovered before. For instance, a cancer diagnosing model that outperforms clinicians could be faulty, or it could actually be picking up on some hidden patterns in an X-ray image that represent an early pathological pathway for cancer that were either unknown to human doctors or thought to be irrelevant, Zhou says.
It's still very early days for that area of research, however.
Words of warning
While explanation methods can sometimes be useful for machine-learning practitioners when they are trying to catch bugs in their models or understand the inner-workings of a system, end-users should proceed with caution when trying to use them in practice, says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in CSAIL.
As machine learning has been adopted in more disciplines, from health care to education, explanation methods are being used to help decision makers better understand a model's predictions so they know when to trust the model and use its guidance in practice. But Ghassemi warns against using these methods in that way.
"We have found that explanations make people, both experts and nonexperts, overconfident in the ability or the advice of a specific recommendation system. I think it is very important for humans not to turn off that internal circuitry asking, 'let me question the advice that I am
given,'" she says.
Scientists know explanations make people over-confident based on other recent work, she adds, citing some recent studies by Microsoft researchers.
Far from a silver bullet, explanation methods have their share of problems. For one, Ghassemi's recent research has shown that explanation methods can perpetuate biases and lead to worse outcomes for people from disadvantaged groups.
Another pitfall of explanation methods is that it is often impossible to tell if the explanation method is correct in the first place. One would need to compare the explanations to the actual model, but since the user doesn't know how the model works, this is circular logic, Zhou says.
He and other researchers are working on improving explanation methods so they are more faithful to the actual model's predictions, but Zhou cautions that, even the best explanation should be taken with a grain of salt.
"In addition, people generally perceive these models to be human-like decision makers, and we are prone to overgeneralization. We need to calm people down and hold them back to really make sure that the generalized model understanding they build from these local explanations are balanced," he adds.
Zhou's most recent research seeks to do just that.
What's next for machine-learning explanation methods?
Rather than focusing on providing explanations, Ghassemi argues that more effort needs to be done by the research community to study how information is presented to decision makers so they understand it, and more regulation needs to be put in place to ensure machine-learning models are used responsibly in practice. Better explanation methods alone aren't the answer.
"I have been excited to see that there is a lot more recognition, even in industry, that we can't just take this information and make a pretty dashboard and assume people will perform better with that. You need to have measurable improvements in action, and I'm hoping that leads to real guidelines about improving the way we display information in these deeply technical fields, like medicine," she says.
And in addition to new work focused on improving explanations, Zhou expects to see more research related to explanation methods for specific use cases, such as model debugging, scientific discovery, fairness auditing, and safety assurance. By identifying fine-grained characteristics of explanation methods and the requirements of different use cases, researchers could establish a theory that would match explanations with specific scenarios, which could help overcome some of the pitfalls that come from using them in real-world scenarios.