---
title: NLC Explorer
emoji: 🧭 🔍 ⁉️
colorFrom: gray
colorTo: purple
sdk: streamlit
sdk_version: 1.10.0
app_file: app.py
pinned: false
license: mit
---

# NLC-Explorer
### A Natural Language Counterfactual Generator for Exploring Bias in Sentiment Analysis Algorithms

##### Overview
This project is a digression from the project on [Interactive Model Cards](https://github.com/amcrisan/interactive-model-cards). It focuses on providing a person more ways to explore a model's outputs through the generation of alternatives (technically [counterfactuals](https://plato.stanford.edu/entries/counterfactuals/#WhatCoun)). We believe the use of multiple alternatives may allow people to better understand the limitations of a model and develop a sense of its trustworthiness and bias.

##### Known Limitations
* Words not in the spaCy vocab for `en_core_web_lg` won't have vectors and so won't have the ability to create similarity scores.
* WordNet provides many limitations due to its age and lack of funding for ongoing maintenance. It provides access to a large variety of the English language but certain words simply do not exist.
* There are currently only 2 lists (Countries and Professions). We would like to find community curated lists for: Race, Sexual Orientation and Gender Identity (SOGI), Religion, age, and other protected statuses.
* We do not have a custom pipeline for Named Entity Recognition (NER), or a matcher, to identify complex terms (ex. "two spirit",  "male to female", "Asian American", etc.) and so these will not be fully available for interrogation. 


##### Key Dependencies and Packages

1. [Hugging Face Transformers](https://huggingface.co/) - the model we've designed this iteration for is hosted on hugging face. It is: [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
2. [Streamlit](https://streamlit.io) - This is the library we're using to build the prototype app because it is easy to stand up and quick to fix.
3. [spaCy](https://spacy.io) - This is the main NLP Library we're using and it runs most of the text manipulation we're doing as part of the project.
4. [NLTK + WordNet](https://www.nltk.org/howto/wordnet.html) - This is the initial lexical database we're using because it is accessible directly through Python and it is free. We will be considering a move to [ConceptNet](https://conceptnet.io/) for future iterations based on better lateral movement across edges.
5. [Lime](https://github.com/marcotcr/lime) - We chose Lime over Shap because Lime has more of the functionality we need. Shap appears to provide greater performance but is not as easily suited to our original designs.
6. [Altair](https://altair-viz.github.io/user_guide/encoding.html) - We're using Altair because it's well integrated into Streamlit.