metadata

language:
  - en
tags:
  - text-classification
license: apache-2.0
widget:
  - text: sdfsdfa
    example_title: Gibberish
  - text: idkkkkk
    example_title: Uncertainty
  - text: Because you asked
    example_title: Refusal
  - text: Necessity
    example_title: High-risk
  - text: My job went remote and I needed to take care of my kids
    example_title: Valid

SANDS

Semi-Automated Non-response Detection for Surveys model (uncased)

Non-response detection designed to be used for open-ended survey responses in conjunction with human reviewers.

Model Details

Model Description: This model is a fine-tuned version of the supervised SimCSE BERT base uncased model. It was introduced at AAPOR 2022 at the talk Toward a Semi-automated item nonresponse detector model for open-response data. The model is uncased, so it does not treats important, Important, and ImPoRtAnT the same.

Developed by: National Center for Health Statistics, Centers for Disease Control and Prevention
Model Type: Text Classification
Language(s): English
License: Apache-2.0

Parent Model: For more details about SimCSE, we encourage users to check out the SimCSE Github repository, arXiv publication, and the base model on HuggingFace.

How to Get Started with the Model

Example of classification of a set of responses:


from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import pandas as pd

# Load the model
model_location = "pretrained/"
model = AutoModelForSequenceClassification.from_pretrained(model_location)
tokenizer = AutoTokenizer.from_pretrained(model_location)

# Create example responses to test
responses = [
    "sdfsdfa",
    "idkkkkk",
    "Because you asked",
    "Necessity",
    "My job went remote and I needed to take care of my kids",
]

# Run the model and compute a score for each response
with torch.no_grad():
    tokens = tokenizer(responses, padding=True, truncation=True, return_tensors="pt")
    output = model(**tokens)
    scores = torch.softmax(output.logits, dim=1).numpy()
    
# Display the scores in a table
columns = ["Gibberish", "Uncertainty", "Refusal", "High-risk", "Valid"]
df = pd.DataFrame(scores, columns=columns)
df.index.name = "Response"
print(df)

Response	Gibberish	Uncertainty	Refusal	High-risk	Valid
sdfsdfa	0.998	0.000	0.000	0.000	0.000
idkkkkk	0.002	0.995	0.001	0.001	0.001
Because you asked	0.001	0.001	0.976	0.006	0.014
Necessity	0.001	0.001	0.002	0.980	0.016
My job went remote and I needed to take care of my kids	0.000	0.000	0.000	0.000	1.000

Uses

Direct Uses

This model is intended to be used on survey responses for data cleaning to help researchers filter out non-responsive responses or junk responses to aid in research and analysis. The model will return a score for a response in 5 different categories: Gibberish, Refusal, Uncertainty, High Risk, and Valid as a probability vector that sums to 1.

Response types

Gibberish: Nonsensical response where the respondent entered text without regard for English syntax. Examples: ksdhfkshgk and sadsadsadsadsadsadsad
Refusal: Responses that with valid English but are either a direct refusal to answer the question asked or a response that provides no contextual relationship to the question asked. Examples: Because or Meow.
Uncertainty: Responses where the respondent does not understand the question, does not know the answer to the question, or does not know how to respond to the question. Examples: I dont know or unsure what you are asking.
High-Risk: Responses that may be valid depending on the context and content of the question. These responses require human subject matter expertise to classify as a valid response or not. Examples: Necessity or Just isolating
Valid: Responses that answer the question at hand and provide an insight to the respondents thought on the subject matter of the question. Examples: COVID began for me when my children’s school went online and I needed to stay home to watch them or staying home, avoiding crowds, still wear masks

Misuses and Out-of-scope Use

The model has been trained to identify survey non-response in open ended responses, or junk responses , where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight such as meow, ksdhfkshgk, or idk. The model was finetuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the Research and Development Survey or RANDS conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.

The responses are also trained from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.

This model does not assess the factual accuracy of responses or filter out responses with different demographic biases. It was not trained to be factual of people or events and so using the model for such classification is out of scope for the abilities of the model.

We did not train the model to recognize non-response in any language other than English. Responses in languages other than English are out of scope and the model will perform poorly. Any correct classifications are a result of the base SimCSE or Bert Models.

Risks, Limitations, and Biases

As the model was fine-tuned from SimCSE, itself fine-tuned from BERT, it will reproduce all biases inherent in these base models. Due to tokenization, the model may incorrectly classify typos, especially in acronyms. For example: LGBTQ is valid, while LBGTQ is classified as gibberish.

Some examples of refusal responses also can appear to be valid as they did not occur in our limited training set. For example, none of your business is currently returned as valid as it was not a response seen in the first two rounds of RANDS during COVID 19.

Training

Training Data

The model was finetuned on 3,000 labeled open-ended responses from RANDS during COVID 19 Rounds 1 and 2. The base SimCSE BERT model was trained on BookCorpus and English Wikipedia.

Training procedure

Learning rate: 5e-5
Batch size: 16
Number training epochs: 4
Base Model pooling dimension: 768
Number of labels: 5