justtherightsize
/

small-e-czech-binary-online-risks-cs

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

small-e-czech-binary-online-risks-cs / README.md

justtherightsize's picture

justtherightsize

Upload README.md

60fb936 11 months ago

|

history blame contribute delete

No virus

2.42 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	license: mit
	language:
	- cs
	---
	# Model Card for small-e-czech-binary-online-risks-cs

	<!-- Provide a quick summary of what the model is/does. -->

	This model is fine-tuned for binary text classification of Online Risks in Instant Messenger dialogs of Adolescents in Czech.

	## Model Description

	The model was fine-tuned on a dataset of Czech Instant Messenger dialogs of Adolescents. The classification is binary and the model outputs probablities for labels {0,1}: Online Risks present or not.

	- Developed by: Anonymous
	- Language(s): cs
	- Finetuned from: Seznam/small-e-czech

	## Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/justtherightsize/supportive-interactions-and-risks
	- Paper: Stay tuned!

	## Usage
	Here is how to use this model to classify a context-window of a dialogue:

	```python
	import numpy as np
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Prepare input texts. This model is fine-tuned for Czech
	test_texts = ['Utterance1;Utterance2;Utterance3']

	# Load the model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained(
	'justtherightsize/small-e-czech-binary-online-risks-cs', num_labels=2).to("cuda")

	tokenizer = AutoTokenizer.from_pretrained(
	'justtherightsize/small-e-czech-binary-online-risks-cs',
	use_fast=False, truncation_side='left')
	assert tokenizer.truncation_side == 'left'

	# Define helper functions
	def get_probs(text, tokenizer, model):
	inputs = tokenizer(text, padding=True, truncation=True, max_length=256,
	return_tensors="pt").to("cuda")
	outputs = model(**inputs)
	return outputs[0].softmax(1)

	def preds2class(probs, threshold=0.5):
	pclasses = np.zeros(probs.shape)
	pclasses[np.where(probs >= threshold)] = 1
	return pclasses.argmax(-1)

	def print_predictions(texts):
	probabilities = [get_probs(
	texts[i], tokenizer, model).cpu().detach().numpy()[0]
	for i in range(len(texts))]
	predicted_classes = preds2class(np.array(probabilities))
	for c, p in zip(predicted_classes, probabilities):
	print(f'{c}: {p}')

	# Run the prediction
	print_predictions(test_texts)
	```