On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron

Jonathan Donnelly and Adam Roegiest • April 2019 • ECIR 2019

We are concerned with investigating the apparent effective-ness of Radford et al.’s “Sentiment Neuron,” which they claim encapsulates sufficient knowledge to accurately predict sentiment in reviews. In our analysis of the Sentiment Neuron, we find that the removal of the neuron only marginally affects a classifier’s ability to detect and label sentiment and may even improve performance. Moreover, the effectiveness of the Sentiment Neuron can be surpassed by simply using 100 random neurons as features to the same classifier. Using adversarial examples, we show that the generated representation containing the Sentiment Neuron (i.e., the final hidden cell state in a LSTM) is particularly sensitive to the end of a processed sequence. Accordingly, we find that caution needs to be applied when interpreting neuron-based feature representations and potential flaws should be addressed for real-world applicability.

Read the Paper

View the Github Repo

On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron

Read more papers

On Tradeoffs Between Document Signature Methods for a Legal Due Diligence Corpus

From Bubbles to Lists: Designing Clustering for Due Diligence