justtherightsize commited on
Commit
60fb936
1 Parent(s): 4ff1620

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ license: mit
5
+ language:
6
+ - cs
7
+ ---
8
+ # Model Card for small-e-czech-binary-online-risks-cs
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model is fine-tuned for binary text classification of Online Risks in Instant Messenger dialogs of Adolescents in Czech.
13
+
14
+ ## Model Description
15
+
16
+ The model was fine-tuned on a dataset of Czech Instant Messenger dialogs of Adolescents. The classification is binary and the model outputs probablities for labels {0,1}: Online Risks present or not.
17
+
18
+ - **Developed by:** Anonymous
19
+ - **Language(s):** cs
20
+ - **Finetuned from:** Seznam/small-e-czech
21
+
22
+ ## Model Sources
23
+
24
+ <!-- Provide the basic links for the model. -->
25
+
26
+ - **Repository:** https://github.com/justtherightsize/supportive-interactions-and-risks
27
+ - **Paper:** Stay tuned!
28
+
29
+ ## Usage
30
+ Here is how to use this model to classify a context-window of a dialogue:
31
+
32
+ ```python
33
+ import numpy as np
34
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
35
+
36
+ # Prepare input texts. This model is fine-tuned for Czech
37
+ test_texts = ['Utterance1;Utterance2;Utterance3']
38
+
39
+ # Load the model and tokenizer
40
+ model = AutoModelForSequenceClassification.from_pretrained(
41
+ 'justtherightsize/small-e-czech-binary-online-risks-cs', num_labels=2).to("cuda")
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(
44
+ 'justtherightsize/small-e-czech-binary-online-risks-cs',
45
+ use_fast=False, truncation_side='left')
46
+ assert tokenizer.truncation_side == 'left'
47
+
48
+ # Define helper functions
49
+ def get_probs(text, tokenizer, model):
50
+ inputs = tokenizer(text, padding=True, truncation=True, max_length=256,
51
+ return_tensors="pt").to("cuda")
52
+ outputs = model(**inputs)
53
+ return outputs[0].softmax(1)
54
+
55
+ def preds2class(probs, threshold=0.5):
56
+ pclasses = np.zeros(probs.shape)
57
+ pclasses[np.where(probs >= threshold)] = 1
58
+ return pclasses.argmax(-1)
59
+
60
+ def print_predictions(texts):
61
+ probabilities = [get_probs(
62
+ texts[i], tokenizer, model).cpu().detach().numpy()[0]
63
+ for i in range(len(texts))]
64
+ predicted_classes = preds2class(np.array(probabilities))
65
+ for c, p in zip(predicted_classes, probabilities):
66
+ print(f'{c}: {p}')
67
+
68
+ # Run the prediction
69
+ print_predictions(test_texts)
70
+ ```