Spaces:

butterswords
/

nlc-explorer

Sleeping

App Files Files Community

nlc-explorer / README.md

butterswords

Update README.md

7333849 almost 2 years ago

preview code

raw history blame contribute delete

No virus

2.85 kB

	---
	title: NLC Explorer
	emoji: 🧭 🔍 ⁉️
	colorFrom: gray
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.10.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# NLC-Explorer
	### A Natural Language Counterfactual Generator for Exploring Bias in Sentiment Analysis Algorithms

	##### Overview
	This project is a digression from the project on [Interactive Model Cards](https://github.com/amcrisan/interactive-model-cards). It focuses on providing a person more ways to explore a model's outputs through the generation of alternatives (technically [counterfactuals](https://plato.stanford.edu/entries/counterfactuals/#WhatCoun)). We believe the use of multiple alternatives may allow people to better understand the limitations of a model and develop a sense of its trustworthiness and bias.

	##### Known Limitations
	* Words not in the spaCy vocab for `en_core_web_lg` won't have vectors and so won't have the ability to create similarity scores.
	* WordNet provides many limitations due to its age and lack of funding for ongoing maintenance. It provides access to a large variety of the English language but certain words simply do not exist.
	* There are currently only 2 lists (Countries and Professions). We would like to find community curated lists for: Race, Sexual Orientation and Gender Identity (SOGI), Religion, age, and other protected statuses.
	* We do not have a custom pipeline for Named Entity Recognition (NER), or a matcher, to identify complex terms (ex. "two spirit", "male to female", "Asian American", etc.) and so these will not be fully available for interrogation.


	##### Key Dependencies and Packages

	1. [Hugging Face Transformers](https://huggingface.co/) - the model we've designed this iteration for is hosted on hugging face. It is: [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
	2. [Streamlit](https://streamlit.io) - This is the library we're using to build the prototype app because it is easy to stand up and quick to fix.
	3. [spaCy](https://spacy.io) - This is the main NLP Library we're using and it runs most of the text manipulation we're doing as part of the project.
	4. [NLTK + WordNet](https://www.nltk.org/howto/wordnet.html) - This is the initial lexical database we're using because it is accessible directly through Python and it is free. We will be considering a move to [ConceptNet](https://conceptnet.io/) for future iterations based on better lateral movement across edges.
	5. [Lime](https://github.com/marcotcr/lime) - We chose Lime over Shap because Lime has more of the functionality we need. Shap appears to provide greater performance but is not as easily suited to our original designs.
	6. [Altair](https://altair-viz.github.io/user_guide/encoding.html) - We're using Altair because it's well integrated into Streamlit.