metadata

language:
  - ru
  - en
  - ru-RU
tags:
  - xlm-roberta-large
datasets:
  - IlyaGusev/headline_cause
license: apache-2.0

XLM-RoBERTa HeadlineCause Simple

Model description

[More Information Needed]

Intended uses & limitations

How to use

from tqdm.notebook import tqdm
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

def get_batch(data, batch_size):
    start_index = 0
    while start_index < len(data):
        end_index = start_index + batch_size
        batch = data[start_index:end_index]
        yield batch
        start_index = end_index


def pipe_predict(data, pipe, batch_size=64):
    raw_preds = []
    for batch in tqdm(get_batch(data, batch_size)):
        raw_preds += pipe(batch)
    return raw_preds

MODEL_NAME = TOKENIZER_NAME = "IlyaGusev/xlm_roberta_large_headline_cause_simple"
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME, do_lower_case=False)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, framework="pt", return_all_scores=True)
texts = [
    (
        "Judge issues order to allow indoor worship in NC churches",
        "Some local churches resume indoor services after judge lifted NC governor’s restriction"
    ),
    (
        "Gov. Kevin Stitt defends $2 million purchase of malaria drug touted by Trump",
        "Oklahoma spent $2 million on malaria drug touted by Trump"
    ),
    (
        "Песков опроверг свой перевод на удаленку",
        "Дмитрий Песков перешел на удаленку"
    )
]
pipe_predict(texts, pipe)

Limitations and bias

[More Information Needed]

Training data

[More Information Needed]

Training procedure

[More Information Needed]

Eval results

[More Information Needed]

BibTeX entry and citation info

@misc{gusev2021headlinecause,
      title={HeadlineCause: A Dataset of News Headlines for Detecting Casualties}, 
      author={Ilya Gusev and Alexey Tikhonov},
      year={2021},
      eprint={2108.12626},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}