# Duct Tape Pipeline
To explore how users may interact with interactive visualizations of counterfactuals for evolving the Interactive Model Card, we will need to first find a way to generate counterfactuals based on a given input. We want the user to be able to provide their input and direct the system to generate counterfactuals based on a part of speech that is significant to the model. The system should then provide a data frame of counterfactuals to be used in an interactive visualization. Below is an example wireframe of the experience based on previous research.

![wireframe](Assets/VizNLC-Wireframe-example.png)

## Goals of this notebook
* Clean up the flow in the "duct tape pipeline".
* See if I can extract the LIME list for visualization

## Loading the libraries and models

In [1]:
#Import the libraries we know we'll need for the Generator.
import pandas as pd, spacy, nltk, numpy as np
from spacy import displacy
from spacy.matcher import Matcher
#!python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_md")
lemmatizer = nlp.get_pipe("lemmatizer")

#Import the libraries to support the model, predictions, and LIME.
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
import lime
import torch
import torch.nn.functional as F
from lime.lime_text import LimeTextExplainer

#Import the libraries for generating interactive visualizations.
import altair as alt

In [2]:
#Defining all necessary variables and instances.
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
class_names = ['negative', 'positive']
explainer = LimeTextExplainer(class_names=class_names)

In [3]:
#Defining a Predictor required for LIME to function.
def predictor(texts):
    outputs = model(**tokenizer(texts, return_tensors="pt", padding=True))
    probas = F.softmax(outputs.logits, dim=1).detach().numpy()
    return probas

In [4]:
#Instantiate a matcher and use it to test some patterns.
matcher = Matcher(nlp.vocab)
pattern = [{"ENT_TYPE": {"IN":["NORP","GPE"]}}]
matcher.add("proper_noun", [pattern])
pattern_test = [{"DEP": "amod"},{"DEP":"attr"},{"TEXT":"-"},{"DEP":"attr","OP":"+"}]
matcher.add("amod_attr",[pattern_test])
pattern_an = [{"DEP": "amod"},{"POS":{"IN":["NOUN","PROPN"]}},{"DEP":{"NOT_IN":["attr"]}}]
matcher.add("amod_noun", [pattern_an])

In [5]:
def match_this(matcher, doc):
    matches = matcher(doc)
    for match_id, start, end in matches:
        matched_span = doc[start:end]
        print(f"Mached {matched_span.text} by the rule {nlp.vocab.strings[match_id]}.")
    return matches

## Building the Duct-Tape Pipeline cell-by-cell

In [6]:
gender = ["man", "woman","girl","boy","male","female","husband","wife","girlfriend","boyfriend","brother","sister","aunt","uncle","grandma","grandpa","granny","granps","grandmother","grandfather","mama","dada","Ma","Pa","lady","gentleman"]

In [7]:
def select_crit(document, options=False, limelist=False):
    '''This function is meant to select the critical part of a sentence. Critical, in this context means
    the part of the sentence that is either: A) a PROPN from the correct entity group; B) an ADJ associated with a NOUN;
    C) a NOUN that represents gender. It also checks this against what the model thinks is important if the user defines "options" as "LIME" or True.'''
    chunks = list(document.noun_chunks)
    pos_options = []
    lime_options = []
    
    #Identify what the model cares about.
    if options:
        exp = explainer.explain_instance(document.text, predictor, num_features=15, num_samples=2000)
        lime_results = exp.as_list()
        #prints the results from lime for QA.
        if limelist == True:
            print(lime_results)
        for feature in lime_results:
            lime_options.append(feature[0])
        lime_results = pd.DataFrame(lime_results, columns=["Word","Weight"])
    
    #Identify what we care about "parts of speech"
    for chunk in chunks:
        #The use of chunk[-1] is due to testing that it appears to always match the root
        root = chunk[-1]
        #This currently matches to a list I've created. I don't know the best way to deal with this so I'm leaving it as is for the moment.
        if root.text.lower() in gender:
            cur_values = [token.text for token in chunk if token.pos_ in ["NOUN","ADJ"]]
            if (all(elem in lime_options for elem in cur_values) and ((options == "LIME") or (options == True))) or ((options != "LIME") and (options != True)):
                pos_options.extend(cur_values)
                #print(f"From {chunk.text}, {cur_values} added to pos_options due to gender.") #for QA
        #This is currently set to pick up entities in a particular set of groups (which I recently expanded). Should it just pick up all named entities?
        elif root.ent_type_ in ["GPE","NORP","DATE","EVENT"]:
            cur_values = []
            if (len(chunk) > 1) and (chunk[-2].dep_ == "compound"):
                #creates the compound element of the noun
                compound = [x.text for x in chunk if x.dep_ == "compound"]
                print(f"This is the contents of {compound} and it is {all(elem in lime_options for elem in compound)} that all elements are present in {lime_options}.") #for QA
                #checks to see all elements in the compound are important to the model or use the compound if not checking importance.
                if (all(elem in lime_options for elem in compound) and ((options == "LIME") or (options == True))) or ((options != "LIME") and (options != True)):
                    #creates a span for the entirety of the compound noun and adds it to the list.
                    span = -1 * (1 + len(compound))
                    pos_options.append(chunk[span:].text)
                    cur_values + [token.text for token in chunk if token.pos_ == "ADJ"]
            else: 
                cur_values = [token.text for token in chunk if (token.ent_type_ in ["GPE","NORP","DATE","EVENT"]) or (token.pos_ == "ADJ")]
            if (all(elem in lime_options for elem in cur_values) and ((options == "LIME") or (options == True))) or ((options != "LIME") and (options != True)):
                pos_options.extend(cur_values)
                print(f"From {chunk.text}, {cur_values} and {pos_options} added to pos_options due to entity recognition.") #for QA
        elif len(chunk) > 1:
            cur_values = [token.text for token in chunk if token.pos_ in ["NOUN","ADJ"]]
            if (all(elem in lime_options for elem in cur_values) and ((options == "LIME") or (options == True))) or ((options != "LIME") and (options != True)):
                pos_options.extend(cur_values)
                print(f"From {chunk.text}, {cur_values} added to pos_options due to wildcard.") #for QA
        else:
            print(f"No options added for \'{chunk.text}\' ")
    
    
    #Return the correct set of options based on user input, defaults to POS for simplicity.
    if options == "LIME":
        return pos_options, lime_results
    else:
        return pos_options

In [8]:
#Test to make sure all three options work
text4 = "This film was filmed in Iraq."
doc4 = nlp(text4)
lime4, limedf = select_crit(doc4,options="LIME")

From This film, ['film'] added to pos_options due to wildcard.
From Iraq, ['Iraq'] and ['film', 'Iraq'] added to pos_options due to entity recognition.


In [9]:
single_nearest = alt.selection_single(on='mouseover', nearest=True)
viz = alt.Chart(limedf).encode(
    alt.X('Weight:Q', scale=alt.Scale(domain=(-1, 1))),
    alt.Y('Word:N', sort='x', axis=None),
    color=alt.Color("Weight", scale=alt.Scale(scheme='blueorange', domain=[0], type="threshold", range='diverging'), legend=None),
    tooltip = ("Word","Weight")
).mark_bar().properties(title ="Importance of individual words")

text = viz.mark_text(
    fill="black",
    align='right',
    baseline='middle'
).encode(
    text='Word:N'
)
limeplot = alt.LayerChart(layer=[viz,text], width = 300).configure_axis(grid=False).configure_view(strokeWidth=0)
limeplot

### Testing predictions and visualization
Here I will attempt to import the model from huggingface, generate predictions for each of the sentences, and then visualize those predictions into a dot plot. If I can get this to work then I will move on to testing a full pipeline for letting the user pick which part of the sentence they wish to generate counterfactuals for.

In [10]:
#Discovering that there's a pipeline specifically to provide scores. 
#I used it to get a list of lists of dictionaries that I can then manipulate to calculate the proper prediction score.
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)

In [11]:
def eval_pred(text):
    '''A basic function for evaluating the prediction from the model and turning it into a visualization friendly number.'''
    preds = pipe(text)
    neg_score = preds[0][0]['score']
    pos_score = preds[0][1]['score']
    if pos_score >= neg_score:
        return pos_score
    if neg_score >= pos_score:
        return -1 * neg_score

In [12]:
def eval_pred_test(text, return_all = False):
    '''A basic function for evaluating the prediction from the model and turning it into a visualization friendly number.'''
    preds = pipe(text)
    neg_score = -1 * preds[0][0]['score']
    sent_neg = preds[0][0]['label']
    pos_score = preds[0][1]['score']
    sent_pos = preds[0][1]['label']
    prediction = 0
    sentiment = ''
    if pos_score > abs(neg_score):
        prediction = pos_score
        sentiment = sent_pos
    elif abs(neg_score) > pos_score:
        prediction = neg_score
        sentiment = sent_neg
        
    if return_all:
        return prediction, sentiment
    else:
        return prediction

## Load the dummy countries I created to test generating counterfactuals
I decided to test the pipeline with a known problem space. Taking the text from Aurélien Géron's observations in twitter, I built a built a small scale test using the learnings I had to prove that we can identify a particular part of speech, use it to generate counterfactuals, and then build a visualization off it.

In [13]:
#load my test data from https://github.com/dbouquin/IS_608/blob/master/NanosatDB_munging/Countries-Continents.csv
df = pd.read_csv("Assets/Countries/countries.csv")
df.head()

Unnamed: 0,Country,Continent
0,Algeria,Africa
1,Angola,Africa
2,Benin,Africa
3,Botswana,Africa
4,Burkina,Africa


In [14]:
#Note: we will need to build the function that lets the user choose from the options available. For now I have hard coded it as "selection", from "user_options".
user_options = select_crit(doc4)
print(user_options)
selection = user_options[1]
selection

From This film, ['film'] added to pos_options due to wildcard.
From Iraq, ['Iraq'] and ['film', 'Iraq'] added to pos_options due to entity recognition.
['film', 'Iraq']


'Iraq'

In [15]:
#Create a function that generates the counterfactuals within a data frame.
def gen_cf_country(df,document,selection):
    df['text'] = df.Country.apply(lambda x: document.text.replace(selection,x))
    df['prediction'] = df.text.apply(eval_pred_test)
    #added this because I think it will make the end results better if we ensure the seed is in the data we generate counterfactuals from.
    df['seed'] = df.Country.apply(lambda x: 'seed' if x == selection else 'alternative')
    return df

df = gen_cf_country(df,doc4,selection)
df.head()

Unnamed: 0,Country,Continent,text,prediction,seed
0,Algeria,Africa,This film was filmed in Algeria.,0.806454,alternative
1,Angola,Africa,This film was filmed in Angola.,-0.775854,alternative
2,Benin,Africa,This film was filmed in Benin.,0.962272,alternative
3,Botswana,Africa,This film was filmed in Botswana.,0.785837,alternative
4,Burkina,Africa,This film was filmed in Burkina.,0.87298,alternative


In [16]:
single_nearest = alt.selection_single(on='mouseover', nearest=True)
full = alt.Chart(df).encode(
    alt.X('Continent:N'),  # specify nominal data
    alt.Y('prediction:Q'),  # specify quantitative data
    color=alt.Color('seed:N', legend=alt.Legend(title="Seed or Alternative")),
    size='seed:N',
    tooltip=('Country','prediction')
).mark_circle(opacity=.5).properties(width=300).add_selection(single_nearest)

full

In [17]:
df2 = df.nlargest(5, 'prediction')
df3 = df.nsmallest(5, 'prediction')
frames = [df2,df3]
results = pd.concat(frames)

In [18]:
bar = alt.Chart(results).encode(  
    alt.X('prediction:Q'), 
    alt.Y('Country:N', sort="-x"),
    color=alt.Color('seed:N', legend=alt.Legend(title="Seed or Alternative")),
    size='seed:N',
    tooltip=('Country','prediction')
).mark_circle().properties(width=300).add_selection(single_nearest)

bar

In [34]:
def critical_words(document, options=False):
    '''This function is meant to select the critical part of a sentence. Critical, in this context means
    the part of the sentence that is either: A) a PROPN from the correct entity group; B) an ADJ associated with a NOUN;
    C) a NOUN that represents gender. It also checks this against what the model thinks is important if the user defines "options" as "LIME" or True.'''
    if type(document) is not spacy.tokens.doc.Doc:
        document = nlp(document)
    chunks = list(document.noun_chunks)
    pos_options = []
    lime_options = []
    
    #Identify what the model cares about.
    if options:
        exp = explainer.explain_instance(document.text, predictor, num_features=15, num_samples=2000)
        lime_results = exp.as_list()
        for feature in lime_results:
            lime_options.append(feature[0])
        lime_results = pd.DataFrame(lime_results, columns=["Word","Weight"])
    
    #Identify what we care about "parts of speech". The first section focuses on NOUNs and related ADJ.
    for chunk in chunks:
        #The use of chunk[-1] is due to testing that it appears to always match the root
        root = chunk[-1]
        #This currently matches to a list I've created. I don't know the best way to deal with this so I'm leaving it as is for the moment.
        if root.ent_type_:
            cur_values = []
            if (len(chunk) > 1) and (chunk[-2].dep_ == "compound"):
                #creates the compound element of the noun
                compound = [x.text for x in chunk if x.dep_ == "compound"]
                print(f"This is the contents of {compound} and it is {all(elem in lime_options for elem in compound)} that all elements are present in {lime_options}.") #for QA
                #checks to see all elements in the compound are important to the model or use the compound if not checking importance.
                if (all(elem in lime_options for elem in cur_values) and (options is True)) or ((options is False)):
                    #creates a span for the entirety of the compound noun and adds it to the list.
                    span = -1 * (1 + len(compound))
                    pos_options.append(chunk[span:].text)
                    cur_values + [token.text for token in chunk if token.pos_ == "ADJ"]
                else:
                    print(f"The elmenents in {compound} could not be added to the final list because they are not all relevant to the model.")
            else: 
                cur_values = [token.text for token in chunk if (token.ent_type_) or (token.pos_ == "ADJ")]
            if (all(elem in lime_options for elem in cur_values) and (options is True)) or ((options is False)):
                pos_options.extend(cur_values)
                print(f"From {chunk.text}, {cur_values} added to pos_options due to entity recognition.") #for QA
        elif len(chunk) >= 1:
            cur_values = [token.text for token in chunk if token.pos_ in ["NOUN","ADJ"]]
            if (all(elem in lime_options for elem in cur_values) and (options is True)) or ((options is False)):
                pos_options.extend(cur_values)
                print(f"From {chunk.text}, {cur_values} added to pos_options due to wildcard.") #for QA
        else:
            print(f"No options added for \'{chunk.text}\' ")
    # Here I am going to try to pick up pronouns, which are people, and Adjectival Compliments.
    for token in document:
        if (token.text not in pos_options) and ((token.text in lime_options) or (options == False)):
            #print(f"executed {token.text} with {token.pos_} and {token.dep_}") #QA
            if (token.pos_ == "ADJ") and (token.dep_ in ["acomp","conj"]):
                pos_options.append(token.text)            
            elif (token.pos_ == "PRON") and (token.morph.get("PronType")[0] == "Prs"):
                pos_options.append(token.text)
    
    #Return the correct set of options based on user input, defaults to POS for simplicity.
    if options:
        return pos_options, lime_results
    else:
        return pos_options

In [20]:
#Testing new code
a = "People are fat and lazy."
b = "I think she is beautiful."
doca = nlp(a)
docb = nlp(b)

In [21]:
optsa, limea = critical_words(doca, True)
optsa

No options added for 'People' 


['fat', 'lazy']

In [22]:
def lime_viz(df):
    single_nearest = alt.selection_single(on='mouseover', nearest=True)
    viz = alt.Chart(df).encode(
        alt.X('Weight:Q', scale=alt.Scale(domain=(-1, 1))),
        alt.Y('Word:N', sort='x', axis=None),
        color=alt.Color("Weight", scale=alt.Scale(scheme='blueorange', domain=[0], type="threshold", range='diverging'), legend=None),
        tooltip = ("Word","Weight")
    ).mark_bar().properties(title ="Importance of individual words")

    text = viz.mark_text(
        fill="black",
        align='right',
        baseline='middle'
    ).encode(
        text='Word:N'
    )
    limeplot = alt.LayerChart(layer=[viz,text], width = 300).configure_axis(grid=False).configure_view(strokeWidth=0)
    return limeplot

In [23]:
test8 = "I saw a white woman walking down the street with an asian man."
opts8, lime8 = critical_words(test8,True)
opts8

No options added for 'I' 
From a white woman, ['white', 'woman'] added to pos_options due to wildcard.
From the street, ['street'] added to pos_options due to wildcard.
From an asian man, ['asian', 'man'] added to pos_options due to wildcard.


['white', 'woman', 'street', 'asian', 'man', 'I']

In [24]:
lime_viz(lime8)

In [25]:
probability, sentiment = eval_pred_test(test8, return_all=True)
options, lime = critical_words(test8,options=True)

No options added for 'I' 
From a white woman, ['white', 'woman'] added to pos_options due to wildcard.
From the street, ['street'] added to pos_options due to wildcard.
From an asian man, ['asian', 'man'] added to pos_options due to wildcard.


In [38]:
bug = "I find men and women deserve the same respect."
options = critical_words(bug)

From I, [] added to pos_options due to wildcard.
From men, ['men'] added to pos_options due to wildcard.
From women, ['women'] added to pos_options due to wildcard.
From the same respect, ['same', 'respect'] added to pos_options due to wildcard.


In [29]:
bug_doc = nlp(bug)

In [35]:
for chunk in bug_doc.noun_chunks:
    print(chunk.text)
    print(chunk[-1].pos_)

I
PRON
a man
NOUN
woman
NOUN
the same respect
NOUN
