File size: 2,521 Bytes
dbb46cc
 
 
 
f38ed9f
dbb46cc
987d07d
 
dbb46cc
 
f38ed9f
dbb46cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
language:
- id
---
# headline_detector

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/kaenova/headline_detector_space)

_Indonesian Headline Detection Model Repository_

There's a [Python library](https://github.com/kaenova/headline_detector) that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.

```sh
$ pip install headline-detector
```

## Available scenario and the performance

| Model        | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
| ------------ | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Fasttext     | 0.8766     | 0.8714     | 0.8793     | 0.8714     | 0.8714     | 0.8661     |
| CNN          | 0.9081     | 0.9081     | 0.8950     | 0.8898     | 0.8950     | 0.8898     |
| IndoBERTweet | 0.9895     | 0.9921     | 0.9738     | 0.9580     | 0.9843     | 0.9685     |

All meassured in accuracy

### Model Throughput

| Model        | Throughput (± Text/seconds) |
| ------------ | --------------------------- |
| IndoBERTweet | ±1.3                        |
| CNN          | ±281.60                     |
| Fasttext     | ±2048.41                    |

Tested on Intel i7-6700k and 32GB of RAM.

## Usage

Output either 0 (non-headline) and 1 (headline)

```python
from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector

detector = FasttextDetector.load_from_scenario(1)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

detector = CNNDetector.load_from_scenario(3)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

detector = IndoBERTweetDetector.load_from_scenario(5)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

# 0 is non-headline
# 1 is headline
```