|
--- |
|
language: |
|
- ru |
|
- en |
|
- ru-RU |
|
tags: |
|
- xlm-roberta-large |
|
datasets: |
|
- IlyaGusev/headline_cause |
|
license: apache-2.0 |
|
--- |
|
|
|
# XLM-RoBERTa HeadlineCause Full |
|
|
|
## Model description |
|
|
|
[More Information Needed] |
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
```python |
|
from tqdm.notebook import tqdm |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
def get_batch(data, batch_size): |
|
start_index = 0 |
|
while start_index < len(data): |
|
end_index = start_index + batch_size |
|
batch = data[start_index:end_index] |
|
yield batch |
|
start_index = end_index |
|
|
|
|
|
def pipe_predict(data, pipe, batch_size=64): |
|
raw_preds = [] |
|
for batch in tqdm(get_batch(data, batch_size)): |
|
raw_preds += pipe(batch) |
|
return raw_preds |
|
|
|
MODEL_NAME = TOKENIZER_NAME = "IlyaGusev/xlm_roberta_large_headline_cause_full" |
|
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME, do_lower_case=False) |
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME) |
|
model.eval() |
|
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, framework="pt", return_all_scores=True) |
|
texts = [ |
|
( |
|
"Judge issues order to allow indoor worship in NC churches", |
|
"Some local churches resume indoor services after judge lifted NC governor’s restriction" |
|
), |
|
( |
|
"Gov. Kevin Stitt defends $2 million purchase of malaria drug touted by Trump", |
|
"Oklahoma spent $2 million on malaria drug touted by Trump" |
|
), |
|
( |
|
"Песков опроверг свой перевод на удаленку", |
|
"Дмитрий Песков перешел на удаленку" |
|
) |
|
] |
|
pipe_predict(texts, pipe) |
|
``` |
|
|
|
#### Limitations and bias |
|
|
|
[More Information Needed] |
|
|
|
## Training data |
|
|
|
[More Information Needed] |
|
|
|
## Training procedure |
|
|
|
[More Information Needed] |
|
|
|
## Eval results |
|
|
|
[More Information Needed] |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@misc{gusev2021headlinecause, |
|
title={HeadlineCause: A Dataset of News Headlines for Detecting Casualties}, |
|
author={Ilya Gusev and Alexey Tikhonov}, |
|
year={2021}, |
|
eprint={2108.12626}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|