kaenova commited on
Commit
dbb46cc
·
1 Parent(s): b79b1f7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ ---
5
+ # [headline_detector](https://github.com/kaenova/headline_detector)
6
+
7
+ _Indonesian Headline Detection Model Repository_
8
+
9
+ This is a Python library that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.
10
+
11
+ ```sh
12
+ $ pip install headline-detector
13
+ ```
14
+
15
+ ## Available scenario and the performance
16
+
17
+ | Model | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
18
+ | ------------ | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
19
+ | Fasttext | 0.8766 | 0.8714 | 0.8793 | 0.8714 | 0.8714 | 0.8661 |
20
+ | CNN | 0.9081 | 0.9081 | 0.8950 | 0.8898 | 0.8950 | 0.8898 |
21
+ | IndoBERTweet | 0.9895 | 0.9921 | 0.9738 | 0.9580 | 0.9843 | 0.9685 |
22
+
23
+ All meassured in accuracy
24
+
25
+ ### Model Throughput
26
+
27
+ | Model | Throughput (± Text/seconds) |
28
+ | ------------ | --------------------------- |
29
+ | IndoBERTweet | ±1.3 |
30
+ | CNN | ±281.60 |
31
+ | Fasttext | ±2048.41 |
32
+
33
+ Tested on Intel i7-6700k and 32GB of RAM.
34
+
35
+ ## Usage
36
+
37
+ Output either 0 (non-headline) and 1 (headline)
38
+
39
+ ```python
40
+ from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector
41
+
42
+ detector = FasttextDetector.load_from_scenario(1)
43
+ data = detector.predict_text(
44
+ [
45
+ "nama kamu siapa?",
46
+ "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
47
+ ]
48
+ )
49
+ print(data) # output: [0, 1]
50
+
51
+ detector = CNNDetector.load_from_scenario(3)
52
+ data = detector.predict_text(
53
+ [
54
+ "nama kamu siapa?",
55
+ "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
56
+ ]
57
+ )
58
+ print(data) # output: [0, 1]
59
+
60
+ detector = IndoBERTweetDetector.load_from_scenario(5)
61
+ data = detector.predict_text(
62
+ [
63
+ "nama kamu siapa?",
64
+ "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
65
+ ]
66
+ )
67
+ print(data) # output: [0, 1]
68
+
69
+ # 0 is non-headline
70
+ # 1 is headline
71
+ ```