IlyaGusev commited on
Commit
c6c8e73
1 Parent(s): 5414232

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ru
4
+ - en
5
+ - ru-RU
6
+ tags:
7
+ - xlm-roberta-large
8
+ datasets:
9
+ - IlyaGusev/headline_cause
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # XLM-RoBERTa HeadlineCause Full
14
+
15
+ ## Model description
16
+
17
+ [More Information Needed]
18
+
19
+ ## Intended uses & limitations
20
+
21
+ #### How to use
22
+
23
+ ```python
24
+ from tqdm.notebook import tqdm
25
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
26
+
27
+ def get_batch(data, batch_size):
28
+ start_index = 0
29
+ while start_index < len(data):
30
+ end_index = start_index + batch_size
31
+ batch = data[start_index:end_index]
32
+ yield batch
33
+ start_index = end_index
34
+
35
+
36
+ def pipe_predict(data, pipe, batch_size=64):
37
+ raw_preds = []
38
+ for batch in tqdm(get_batch(data, batch_size)):
39
+ raw_preds += pipe(batch)
40
+ return raw_preds
41
+
42
+ MODEL_NAME = TOKENIZER_NAME = "IlyaGusev/xlm_roberta_large_headline_cause_full"
43
+ tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME, do_lower_case=False)
44
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
45
+ model.eval()
46
+ pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, framework="pt", return_all_scores=True)
47
+ texts = [
48
+ (
49
+ "Judge issues order to allow indoor worship in NC churches",
50
+ "Some local churches resume indoor services after judge lifted NC governor’s restriction"
51
+ ),
52
+ (
53
+ "Gov. Kevin Stitt defends $2 million purchase of malaria drug touted by Trump",
54
+ "Oklahoma spent $2 million on malaria drug touted by Trump"
55
+ ),
56
+ (
57
+ "Песков опроверг свой перевод на удаленку",
58
+ "Дмитрий Песков перешел на удаленку"
59
+ )
60
+ ]
61
+ pipe_predict(texts, pipe)
62
+ ```
63
+
64
+ #### Limitations and bias
65
+
66
+ [More Information Needed]
67
+
68
+ ## Training data
69
+
70
+ [More Information Needed]
71
+
72
+ ## Training procedure
73
+
74
+ [More Information Needed]
75
+
76
+ ## Eval results
77
+
78
+ [More Information Needed]
79
+
80
+ ### BibTeX entry and citation info
81
+
82
+ ```bibtex
83
+ @misc{gusev2021headlinecause,
84
+ title={HeadlineCause: A Dataset of News Headlines for Detecting Casualties},
85
+ author={Ilya Gusev and Alexey Tikhonov},
86
+ year={2021},
87
+ eprint={2108.12626},
88
+ archivePrefix={arXiv},
89
+ primaryClass={cs.CL}
90
+ }
91
+ ```