Sophisticated statistical models are believed to often bring improved accuracies and efficiencies. But, due to their non-interpretable nature of outputs, they are not very much used by organizations, institutions and governments. They are hence named Black-Boxes. Model interpretability is desired in practical world problems where decisions can have a huge impact (eg. criminal justice, estimating credit scores, health risks etc). Here novel methods that form the state-of-the-art for addressing this particular problem by trying to give a guide to practitioners for appropriate methods to their problems.
In the recent past two decades, machine learning’s applications across many domains advanced very prominently. Together with availability of large datasets and improved algorithms, the exponential growth in computing power led to an unparalleled surge of interest in the topic of machine learning. Representation, Evaluation and Optimization are the basic principles of machine learning, a framework for understanding all algorithms. However, most of them are not very transparent and with the recent additions like deep nural networks and NLP increasing the accuracy exponentially and so did the discouragment to use them in many practitioners.
Local Vs Global Models- In case of very complex models, the scope of local model is restricted to only a particular negiberhood and the best case prediction is determined. In contrast, global models aim at understanding the whole model and hence these atm at understanding how the features affect the result rather than the interpretibility.
Model Agnostic Methods — the general desirable properties for
Interpretability Methods are as follows:
(i) Model flexibility: It refers to the ability of a method to work with any kind of model.
(ii) Explanation flexibility: It refers to the fact that there are several forms to present explanations. Such as natural language, visualizations of learned representations or models, and by example–contrasting an observation of interest against others.
(iii) Representation flexibility: It refers to the capacity of the method of not necessarily explaining via the input features. For example, a text classifier can receive sentences as input, and the explanation can be by individual words.
The two pros of these models agnostic methods being (I) flexibility as in any model of our choice can be used as it uses only the inputs and outputs for interpretability rather than using the internal mechanism of a model. (II) Compability, comparing different models is very simple. Their cons on the other hand are (I) Time Consuming, these methods can be very time consuming due to their complex nature. (II) Sampling variability, a model picks only a small fractions of the entire population of from the data, the interpretability may vary few times.
Model-Agnostic Method Approaches
(A) Perturbation Approach : The effect and contribution of a feature on the output and interpretebility on a model are explained as follows:
(i) Partial Dependence Plots (PDP) — It is global function which uses a function of averaging all outcome value.This method has two disadvantages:one being if two variables are correlated it gives poor estimation and other ,since variance is not used for average the positive and negative output values may cancel out and provide inconsitent results.
(ii) Individual Conditional Expectation (ICE) — The difference between ice and pdp is that in ice for each observation a line is drawn by keeping the other variables constant but in pdf only an average of all the lines of observations of variable is plotted.As a result the cancelling out of values problem is prevented but the correlation problem still exists.
(iii) M-Plots — It is the same as PDP but here values of only neighbouring features are plotted with some artificial values not in the neighborhood.This prevents the averaging problem to some extent but introduces the problem of partial solutions of correlation.
(iv) Accumulated Local effects (ALE) — In Ale plots , differences between the values are plotted instead of values itself.This reduces the average problem and it shows the relative effect of output when values change.
(v) Shapley Values (SHAP) — Shapely values are calculated corresponding to each feature value.It might be a positive or negative contribution.The drawback is that if the feature values are large,it becomes computationally costly.
(vi) LOCO — Leave one covariate out method employs leaving a covariate and calculating the error in output.As a result the significance of the covariates with respect to the output can be found.
(vii) Decomposition of predictor — This uses decomposition models where features relevance is found by perturbing a particular feature values and keeping the rest constant .The final outcome is calculated and the difference between the observed and actual output helps to calculate the feature relevance.The drawback of this method is it cannot be used on highly correlated datasets.
(viii) Feature Importance — This method uses Pdp ,ICE plots and shap values to obtain a ici( individual conditional importance) and average of ici plots show the global feature importance.
(ix) Sensitive Analysis — Sensitivity measures are used to observe how predictions vary as we change features by using Variable effect characteristics curve which shows the impact of variable throughout the domain.
(x) LIME — Local Interpretable model-agnostic Explanations uses local surrogate models.The features are perturbed in vicinity and this is repeated randomly which generates several datasets .The models like linear regression etc are trained using these to find if model is right for wrong reasons.
(xi) Explanations Vectors — This method finds the explanation vector based on conditional probability of Bayes classifier.This explanation vector defines the local decisions taken in classification problems.The drawback is that the calculation of gradient coefficients is not easy.
(xii) Counterfactuals — The minimum changes which are to be made to the feature values so that the predicted output label is changed to actual label is calculated.This helps to explain But this is an Np hard problem.
(xiii) Anchors — An anchor with set of instructions is set up.These instructions are If then conditions.This uses Lime with decision trees.If conditions are met the other features are ignored since they won’t impact the output significantly.
(B) Contrastive Approach :
(i) Counterfactuals Naturally Observed — To observe similar instances but different output labels, the optimization problem will not be applicable and hence, we apply contrastive approach where the instances are compared with real instances instead of artificial. Sparce instances due to curse of dimensionality is a con.
(ii) Prototype and Criticism — We start by finding instances that well represent the data (prototypes) and then instances that are not well represented (criticisms)
Model specific fields :
(A) Machine vision models
This method concentrates on finding parts lo images that influence the resultant classification. In general Saliency Maps are used for such cases. These are otherwise called Sensitive Maps or Pixel Attribution Maps. ‘Importance’ values are assigned to individual pixels with the help of occlusion calculations with gradients. These values cause an impact on the resultant classification. These values when neglected will cause a very significant drop in the classification score. With the use of these maps, the flaws of the model can be exhibited. By slightly offsetting the image, say by 1 pixel it is proved that the algorithm can be tricked into making a different prediction. As there is no way one can say that we won’t come across such issues in real-time, it has been concluded that the model is not trust-worthy.
(i) Masks — Not long ago Vedaldi and Fong proposed a method identical to LIME. In this method, by slightly offsetting an image, they tried to find out the image perturbation mask that decreases the class score. By focusing on the parts of the image that are used by the black-box to produce the output. The only difference is that in this model the images were explicitly edited.
(ii) Real Time Saliency Maps — A fast saliency detection method was developed in 2017 which can be applied to all differential image-classifiers. As the initial approaches were costly as they develop a saliency map by the removal of parts from the input image. But in this method, the model is trained in such a way that it predicts a map in a single-feedforward pass for the input image. This method is not only affordable but is also fast.
(iii) Smooth Grad — Though Saliency maps make total sense but at-times, the important parts shown by the algorithm seems like they has been randomly chosen. So another model called the Smooth Grad came into play. Here, to make the resultant trust-worthy, attempts are made to decrease the noise in the input image. Noises are induced into the original image, so that the fluctuation in the resultant can be calculated and thereby giving out the important parts.
(iv) Layer wise Relevant Propagation — Proposed by Ross et at in 2013. In this method, the role of each pixel was mapped in the form of a heat-map. It is described as the tool-set for deconstructing a non-linear decision and fostering transparency for the user. Two methods were proposed to find the pixel contribution. One of them used taylor-decomposition while the other one is similar to back-propagation, which is more efficient.
(v) Heat Maps — This was proposed by Zeiler and Fregus in 2013. In this, they relate the most important parts of the input-image to the neural network’s decision on any target-classifier. Back propagation and gradients are used to find the relevant pixels.
(B) General Neural Networks
(i) Differentiable Models — Ross et al in 2017 stated that This method not only points out the errors in the LIME or Saliency Maps but also provides a solution. This also states that sometimes LIME gives out wrong results when training and testing data are different from each-other. When provided with annotations that are true for wrong reasons, This method helps the classifier to explore alternate possibilities. When annotations aren’t provided, A sample of equally valid explanations are found so that an expert can decide the most reasonable one among them.
(ii) DeepLIFT — Proposed by Shrikumar in 2017, it stands for Deep Networks while LIFT stands for Learning Important Features. In this the Important scores are computed within a deep neural network. In this method, each neuron is compared with its reference-activation. This reference-activation can be extracted by the activation of every individual neuron while applying a reference input.
(iii) Taylor decomposition — The classification output was broken into the contribution of each input element. This method is called Deep Taylor-decomposition. It accesses the most relevant pixels importance in an image, where the explanation is showed in the form of a heat-map
(iv) Integrated Gradients — This is a simple method and can be implemented quickly into the DeepNetwork. As this method satisfies Implementation-Invariance and Sensitivity it is well sustained. The input image is considered as the base line. This base line has the least prediction score, in the nth-dimension. Here, n is the size of the image. Then an input is taken and a segment of line is defined in the nth dimension that combines both images. Then, the path integral across the lines is calculated. The visualized form of these can be seen as a heat map which are a bit difficult for human understanding.
(v) I-GOS — An observation was made that when the heat maps don’t correlate with the network this may mislead the humans. At times, Heat maps might not provide true explanation. Another method was proposed where, the smallest and smoothest area is found that provides a maximum impact in decreasing the resultant of a neural network. Though, this method might be inefficient and can be stuck in a local minimum. So another method called I-GOS was proposed that uses the integrated gradients to improve the mask optimization process. This process computes descent directions instead of normal gradients based on integrated gradients.
(vi) Grad-cam — The Gradient weighted Class Activation Mapping or Grad-cam highlights the crucial regions in the input in order to predict the class. Whenever an unreasonable prediction is given, this algorithm gives out a relevant explanation for why the same happened.
(C) Decision Tree Methods
Tree Explainer — This group of model contain random forests, gradient boosted trees and other tree based models. These are famous for being interpretable and accurate, i.e it is understandable what all features were used in making the prediction. Current local explanations for such models are:
1. Reporting the decision path
2. Assigning the contribution of individual feature
3. Applying model agnostic approach
1. Not useful when the model utilizes multiple trees for final prediction
2. Explanation might be biased
3. Might be slow and suffer sampling variability
Relevant and Novel approaches were reviewed in this survey which gives light to the problem of explaining individual instances in Machine learning. Explaining the model prediction has become increasingly desirable as the trend of using the highly complex models for the explanation of algorithms has spread. Some of the interpretation models use natural language while others use visualization of models or learned representations. The methods are divided based on Model specific approach and Model agnostic approach. Model Agnostic approach can be used on any type of Machine Learning model. While, the Model Specific approach can be applied to only a particular group of models. Model Agnostic approach was sub-classified by taxonomy into SHAP and LIME. Model Specific approach was sub-classified into Computational Neural Networks, General Neural Networks and Decision Trees. Recently this family of Tree approach has out-performed the Neural networks.
 Alfredo Carrillo Luis F. Cantú Alejandro Noriega
email@example.com firstname.lastname@example.org email@example.com https://arxiv.org/pdf/2104.04144.pdf
 Robert Pelzer. Policing of terrorism using data from social
media. European Journal for Security Research, 3(2):163–179,
 Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan.
Inherent trade-offs in the fair determination of risk scores.
arXiv preprint arXiv:1609.05807, 2016.