justtherightsize
/

small-e-czech-2stage-online-risks-cs

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

small-e-czech-2stage-online-risks-cs / README.md

justtherightsize's picture

justtherightsize

Upload README.md

332320a 9 months ago

|

raw history blame contribute delete

No virus

2.6 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	license: mit
	language:
	- cs
	---
	# Model Card for small-e-czech-2stage-online-risks-cs

	<!-- Provide a quick summary of what the model is/does. -->

	This model is fine-tuned for 2nd stage multi-label text classification of Online Risks in Instant Messenger dialogs of Adolescents - it expects inputs where at least one of the classes appears.

	## Model Description

	The model was fine-tuned on a dataset of Instant Messenger dialogs of Adolescents. The classification is 2stage and the model outputs probablities for labels {0,1,2,3,4}:

	0. Aggression, Harassing, Hate
	1. Mental Health Problems
	2. Alcohol, Drugs
	3. Weight Loss, Diets
	4. Sexual Content

	- Developed by: Anonymous
	- Language(s): cs
	- Finetuned from: small-e-czech

	## Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/justtherightsize/supportive-interactions-and-risks
	- Paper: Stay tuned!

	## Usage
	Here is how to use this model to classify a context-window of a dialogue:

	```python
	import numpy as np
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Prepare input texts. This model is pretrained on multi-lingual data
	# and fine-tuned on English
	test_texts = ['Utterance1;Utterance2;Utterance3']

	# Load the model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained(
	'justtherightsize/small-e-czech-2stage-online-risks-cs', num_labels=5).to("cuda")

	tokenizer = AutoTokenizer.from_pretrained(
	'justtherightsize/small-e-czech-2stage-online-risks-cs',
	use_fast=False, truncation_side='left')
	assert tokenizer.truncation_side == 'left'

	# Define helper functions
	def predict_one(text: str, tok, mod, threshold=0.5):
	encoding = tok(text, return_tensors="pt", truncation=True, padding=True,
	max_length=256)
	encoding = {k: v.to(mod.device) for k, v in encoding.items()}
	outputs = mod(**encoding)
	logits = outputs.logits
	sigmoid = torch.nn.Sigmoid()
	probs = sigmoid(logits.squeeze().cpu())
	predictions = np.zeros(probs.shape)
	predictions[np.where(probs >= threshold)] = 1
	return predictions, probs

	def print_predictions(texts):
	preds = [predict_one(tt, tokenizer, model) for tt in texts]
	for c, p in preds:
	print(f'{c}: {p.tolist():.4f}')

	# Run the prediction
	print_predictions(test_texts)
	```