nlc-explorer / README OG.md
Nathan Butters
Add all files
03287bc

NLC-Gen

A Natural Language Counterfactual Generator for Exploring Bias in Sentiment Analysis Algorithms

Overview

This project is an extension of Interactive Model Cards. It focuses on providing a person more ways to explore the bias of a model through the generation of alternatives (technically counterfactuals). We believe the use of alternatives people can better understand the limitations of a model and develop productive skepticism around its usage and trustworthiness.

Set up

Download the files from Github then perform the commands below in

cd NLC-Gen
pipenv install
pipenv shell
python -m spacy download en_core_web_lg
streamlit run NLC-app.py
Known Limitations
  • Words not in the spaCy vocab for en_core_web_lg won't have vectors and so won't have the ability to create similarity scores.
  • WordNet provides many limitations due to its age and lack of funding for ongoing maintenance. It provides access to a large variety of the English language but certain words simply do not exist.
  • There are currently only 2 lists (Countries and Professions). We would like to find community curated lists for: Race, Sexual Orientation and Gender Identity (SOGI), Religion, age, and protected status.
Key Dependencies and Packages
  1. Hugging Face Transformers - the model we've designed this iteration for is hosted on hugging face. It is: distilbert-base-uncased-finetuned-sst-2-english.
  2. Streamlit - This is the library we're using to build the prototype app because it is easy to stand up and quick to fix.
  3. spaCy - This is the main NLP Library we're using and it runs most of the text manipulation we're doing as part of the project.
  4. NLTK + WordNet - This is the initial lexical database we're using because it is accessible directly through Python and it is free. We will be considering a move to ConceptNet for future iterations based on better lateral movement across edges.
  5. Lime - We chose Lime over Shap because Lime has more of the functionality we need. Shap appears to provide greater performance but is not as easily suited to our original designs.
  6. Altair - We're using Altair because it's well integrated into Streamlit.