|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- roberta-large |
|
- topic |
|
- news |
|
|
|
widget: |
|
- text: "Diplomatic efforts to deal with the world’s two wars — the civil war in Spain and the undeclared Chinese - Japanese conflict — received sharp setbacks today." |
|
- text: "WASHINGTON. AP. A decisive development appeared in the offing in the tug-of-war between the federal government and the states over the financing of relief." |
|
- text: "A frantic bride called the Rochester Gas and Electric corporation to complain that her new refrigerator “freezes ice cubes too fast.”" |
|
|
|
--- |
|
|
|
# Fine-tuned RoBERTa-large for detecting news on obituaries |
|
|
|
# Model Description |
|
|
|
This model is a finetuned RoBERTa-large, for classifying whether news articles are obituaries. |
|
|
|
# How to Use |
|
|
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("text-classification", model="dell-research-harvard/topic-obits") |
|
classifier("John Smith died after a long illness") |
|
``` |
|
|
|
# Training data |
|
|
|
The model was trained on a hand-labelled sample of data from the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire). |
|
|
|
Split|Size |
|
-|- |
|
Train|272 |
|
Dev|57 |
|
Test|57 |
|
|
|
# Test set results |
|
|
|
Metric|Result |
|
-|- |
|
F1|1.000 |
|
Accuracy|1.000 |
|
Precision|1.000 |
|
Recall|1.000 |
|
|
|
|
|
# Citation Information |
|
|
|
You can cite this dataset using |
|
|
|
``` |
|
@misc{silcock2024newswirelargescalestructureddatabase, |
|
title={Newswire: A Large-Scale Structured Database of a Century of Historical News}, |
|
author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell}, |
|
year={2024}, |
|
eprint={2406.09490}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2406.09490}, |
|
} |
|
``` |
|
|
|
# Applications |
|
|
|
We applied this model to a century of historical news articles. You can see all the classifications in the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire). |
|
|
|
|
|
|