File size: 2,521 Bytes
dbb46cc f38ed9f dbb46cc 987d07d dbb46cc f38ed9f dbb46cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
language:
- id
---
# headline_detector
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/kaenova/headline_detector_space)
_Indonesian Headline Detection Model Repository_
There's a [Python library](https://github.com/kaenova/headline_detector) that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.
```sh
$ pip install headline-detector
```
## Available scenario and the performance
| Model | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
| ------------ | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Fasttext | 0.8766 | 0.8714 | 0.8793 | 0.8714 | 0.8714 | 0.8661 |
| CNN | 0.9081 | 0.9081 | 0.8950 | 0.8898 | 0.8950 | 0.8898 |
| IndoBERTweet | 0.9895 | 0.9921 | 0.9738 | 0.9580 | 0.9843 | 0.9685 |
All meassured in accuracy
### Model Throughput
| Model | Throughput (± Text/seconds) |
| ------------ | --------------------------- |
| IndoBERTweet | ±1.3 |
| CNN | ±281.60 |
| Fasttext | ±2048.41 |
Tested on Intel i7-6700k and 32GB of RAM.
## Usage
Output either 0 (non-headline) and 1 (headline)
```python
from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector
detector = FasttextDetector.load_from_scenario(1)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = CNNDetector.load_from_scenario(3)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = IndoBERTweetDetector.load_from_scenario(5)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
# 0 is non-headline
# 1 is headline
``` |