Spaces:
Sleeping
Sleeping
metadata
title: NLC Explorer
emoji: π§ π βοΈ
colorFrom: gray
colorTo: purple
sdk: streamlit
sdk_version: 1.10.0
app_file: app.py
pinned: false
license: mit
NLC-Explorer
A Natural Language Counterfactual Generator for Exploring Bias in Sentiment Analysis Algorithms
Overview
This project is a digression from the project on Interactive Model Cards. It focuses on providing a person more ways to explore a model's outputs through the generation of alternatives (technically counterfactuals). We believe the use of multiple alternatives may allow people to better understand the limitations of a model and develop a sense of its trustworthiness and bias.
Known Limitations
- Words not in the spaCy vocab for
en_core_web_lg
won't have vectors and so won't have the ability to create similarity scores. - WordNet provides many limitations due to its age and lack of funding for ongoing maintenance. It provides access to a large variety of the English language but certain words simply do not exist.
- There are currently only 2 lists (Countries and Professions). We would like to find community curated lists for: Race, Sexual Orientation and Gender Identity (SOGI), Religion, age, and other protected statuses.
- We do not have a custom pipeline for Named Entity Recognition (NER), or a matcher, to identify complex terms (ex. "two spirit", "male to female", "Asian American", etc.) and so these will not be fully available for interrogation.
Key Dependencies and Packages
- Hugging Face Transformers - the model we've designed this iteration for is hosted on hugging face. It is: distilbert-base-uncased-finetuned-sst-2-english.
- Streamlit - This is the library we're using to build the prototype app because it is easy to stand up and quick to fix.
- spaCy - This is the main NLP Library we're using and it runs most of the text manipulation we're doing as part of the project.
- NLTK + WordNet - This is the initial lexical database we're using because it is accessible directly through Python and it is free. We will be considering a move to ConceptNet for future iterations based on better lateral movement across edges.
- Lime - We chose Lime over Shap because Lime has more of the functionality we need. Shap appears to provide greater performance but is not as easily suited to our original designs.
- Altair - We're using Altair because it's well integrated into Streamlit.