Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- id
|
4 |
+
---
|
5 |
+
# [headline_detector](https://github.com/kaenova/headline_detector)
|
6 |
+
|
7 |
+
_Indonesian Headline Detection Model Repository_
|
8 |
+
|
9 |
+
This is a Python library that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.
|
10 |
+
|
11 |
+
```sh
|
12 |
+
$ pip install headline-detector
|
13 |
+
```
|
14 |
+
|
15 |
+
## Available scenario and the performance
|
16 |
+
|
17 |
+
| Model | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
|
18 |
+
| ------------ | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
|
19 |
+
| Fasttext | 0.8766 | 0.8714 | 0.8793 | 0.8714 | 0.8714 | 0.8661 |
|
20 |
+
| CNN | 0.9081 | 0.9081 | 0.8950 | 0.8898 | 0.8950 | 0.8898 |
|
21 |
+
| IndoBERTweet | 0.9895 | 0.9921 | 0.9738 | 0.9580 | 0.9843 | 0.9685 |
|
22 |
+
|
23 |
+
All meassured in accuracy
|
24 |
+
|
25 |
+
### Model Throughput
|
26 |
+
|
27 |
+
| Model | Throughput (± Text/seconds) |
|
28 |
+
| ------------ | --------------------------- |
|
29 |
+
| IndoBERTweet | ±1.3 |
|
30 |
+
| CNN | ±281.60 |
|
31 |
+
| Fasttext | ±2048.41 |
|
32 |
+
|
33 |
+
Tested on Intel i7-6700k and 32GB of RAM.
|
34 |
+
|
35 |
+
## Usage
|
36 |
+
|
37 |
+
Output either 0 (non-headline) and 1 (headline)
|
38 |
+
|
39 |
+
```python
|
40 |
+
from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector
|
41 |
+
|
42 |
+
detector = FasttextDetector.load_from_scenario(1)
|
43 |
+
data = detector.predict_text(
|
44 |
+
[
|
45 |
+
"nama kamu siapa?",
|
46 |
+
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
|
47 |
+
]
|
48 |
+
)
|
49 |
+
print(data) # output: [0, 1]
|
50 |
+
|
51 |
+
detector = CNNDetector.load_from_scenario(3)
|
52 |
+
data = detector.predict_text(
|
53 |
+
[
|
54 |
+
"nama kamu siapa?",
|
55 |
+
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
|
56 |
+
]
|
57 |
+
)
|
58 |
+
print(data) # output: [0, 1]
|
59 |
+
|
60 |
+
detector = IndoBERTweetDetector.load_from_scenario(5)
|
61 |
+
data = detector.predict_text(
|
62 |
+
[
|
63 |
+
"nama kamu siapa?",
|
64 |
+
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
|
65 |
+
]
|
66 |
+
)
|
67 |
+
print(data) # output: [0, 1]
|
68 |
+
|
69 |
+
# 0 is non-headline
|
70 |
+
# 1 is headline
|
71 |
+
```
|