Edit model card

Model Card for robeczech-base-binary-cs-iib

This model is fine-tuned for binary text classification of Supportive Interactions in Instant Messenger dialogs of Adolescents in Czech.

Model Description

The model was fine-tuned on a Czech dataset of Instant Messenger dialogs of Adolescents. The classification is binary and the model outputs probablities for labels {0,1}: Supportive Interactions present or not.

  • Developed by: Anonymous
  • Language(s): cs
  • Finetuned from: ufal/robeczech-base

Model Sources

Usage

Here is how to use this model to classify a context-window of a dialogue:

import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Prepare input texts. This model is pretrained and fine-tuned for Czech
test_texts = ['Utterance1;Utterance2;Utterance3']

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    'csocsci/robeczech-base-binary-cs-iib', num_labels=2).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(
    'csocsci/robeczech-base-binary-cs-iib',
    use_fast=False, truncation_side='left')
assert tokenizer.truncation_side == 'left'

# Define helper functions
def get_probs(text, tokenizer, model):
    inputs = tokenizer(text, padding=True, truncation=True, max_length=256,
                       return_tensors="pt").to("cuda")
    outputs = model(**inputs)
    return outputs[0].softmax(1)

def preds2class(probs, threshold=0.5):
    pclasses = np.zeros(probs.shape)
    pclasses[np.where(probs >= threshold)] = 1
    return pclasses.argmax(-1)

def print_predictions(texts):
    probabilities = [get_probs(
        texts[i], tokenizer, model).cpu().detach().numpy()[0]
                     for i in range(len(texts))]
    predicted_classes = preds2class(np.array(probabilities))
    for c, p in zip(predicted_classes, probabilities):
        print(f'{c}: {p}')

# Run the prediction
print_predictions(test_texts)
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.