Trusting a Neural Network [In Progress]

General

While the thumbnail to this page is Skynet, trusting a neural network is very different to fighting T-1000s as John Connor. It looks a little more like the image below.

Husky_trust

The image above is from an experiment in the paper by Ribeiro et al (University of Washington) where the researchers trained an intentionally bad classifer which is biased to believe the difference between a husky or a wolf is a snowy background.

Due to issues like non-generalized training sets (e.g. for various skin color tones during melanoma detection or face detection), or inherent biases during data-collection (the malignant cancer detector that doubled as a ruler detector) and labeling (see ImageNet Roulette), blindly trusting the output of a NN isn't well advised.

Yes, a Neural network is an universal function approximator, does it learn what you actually expect it to learn (sort of the Garbage-In Garbage-Out problem)?

interpretability
For image datasets, this is slightly easier: Assuming that you have a consistent dataset and a well-fitted neural network, you are able to pull out activations at different stages in as well as look at a final heatmap that maps an input image to what the NN focuses on (see FastAI). Additionally, in my opinion, varying random seeds and rerunning the training/testing process should help to see what the NN actually learns. However, for time-series datasets, this is a little trickier.

Monte Carlo Dropout Epistemic Uncertainty


                    # Using Python3.6, Torch1.0#
                    
                    def apply_dropout(m):
                      if type(m)==nn.Dropout:
                        m.train()
                        print("Setting dropout as active during inference time")
                    
                    def predictive_entropy(proba):
                      return -1*np.sum(np.log(proba)*proba,axis=1)
                    
                    def mc_predictions(model,X,nsim=100):
                        # Shape number of simulations, input data size, number of classes
                        predictions=np.array([torch.sigmoid(model(X)).cpu().detach().numpy() for _ in range(nsim)])
                        predictions_proba=np.mean(predictions,axis=0)
                        predictions_proba=predictions_proba.reshape(len(predictions_proba),1) # Shape input data size, number of classes
                        # predictions_variance=np.apply_along_axis(predictive_entropy,axis=1,arr=predictions_proba)
                        predictions_variance=predictive_entropy(predictions_proba)
                        return (predictions_proba,predictions_variance)
                  
                    # Dropout layer at the end of every weight layer
                    class MLP_2H_mcdrop(nn.Module):
                        def __init__(self,p=0.0):
                            super(MLP_2H_mcdrop,self).__init__()
                            self.p=p
                            self.hidden0=nn.Sequential(
                                nn.Linear(input_size,hidden_size),
                                nn.LeakyReLU(negative_slope=0.1)
                            )
                            self.hidden1=nn.Sequential(
                                nn.Linear(hidden_size,hidden_size),
                                nn.LeakyReLU(negative_slope=0.1)
                            )
                            self.out=nn.Sequential(
                                nn.Linear(hidden_size,output_size)
                            )
                            self.Dropout=nn.Dropout(p=p)
                            
                        def forward(self,x):
                            x = self.hidden0(x)
                            x = self.Dropout(x)
                            x = self.hidden1(x)
                            x = self.Dropout(x)
                            return self.out(x)
                        
                    # Train for MLP-2 Hidden
                    model=MLP_2H_mcdrop(p=0.5)
                    # print(model)
                  
                  
                

References
  • Ribeiro, M., Singh, S., & Guestrin, G. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. KDD '16. DOI: https://doi.org/10.1145/2939672.2939778
  • Narla, A., Kuprel, B., Sarin, K., Novoa, R., & Ko, J. 2018. "Automated Classification of Skin Lesions: From Pixels to Practice". JID '2018. DOI: https://doi.org/10.1016/j.jid.2018.06.175
  • Shyue, J., Bishara, H., Maine, S., Vilas-Boas, E., & Rodney, S. 2019. 600,000 Images Removed from AI Database After Art Project Exposes Racist Bias. Retrieved from https://hyperallergic.com/518822/600000-images-removed-from-ai-database-after-art-project-exposes-racist-bias/.