--- language: - ru - en - ru-RU tags: - xlm-roberta-large datasets: - IlyaGusev/headline_cause license: apache-2.0 --- # XLM-RoBERTa HeadlineCause Full ## Model description [More Information Needed] ## Intended uses & limitations #### How to use ```python from tqdm.notebook import tqdm from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline def get_batch(data, batch_size): start_index = 0 while start_index < len(data): end_index = start_index + batch_size batch = data[start_index:end_index] yield batch start_index = end_index def pipe_predict(data, pipe, batch_size=64): raw_preds = [] for batch in tqdm(get_batch(data, batch_size)): raw_preds += pipe(batch) return raw_preds MODEL_NAME = TOKENIZER_NAME = "IlyaGusev/xlm_roberta_large_headline_cause_full" tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME, do_lower_case=False) model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME) model.eval() pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, framework="pt", return_all_scores=True) texts = [ ( "Judge issues order to allow indoor worship in NC churches", "Some local churches resume indoor services after judge lifted NC governor’s restriction" ), ( "Gov. Kevin Stitt defends $2 million purchase of malaria drug touted by Trump", "Oklahoma spent $2 million on malaria drug touted by Trump" ), ( "Песков опроверг свой перевод на удаленку", "Дмитрий Песков перешел на удаленку" ) ] pipe_predict(texts, pipe) ``` #### Limitations and bias [More Information Needed] ## Training data [More Information Needed] ## Training procedure [More Information Needed] ## Eval results [More Information Needed] ### BibTeX entry and citation info ```bibtex @misc{gusev2021headlinecause, title={HeadlineCause: A Dataset of News Headlines for Detecting Casualties}, author={Ilya Gusev and Alexey Tikhonov}, year={2021}, eprint={2108.12626}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```