Tolerblanc commited on
Commit
584dd9a
1 Parent(s): 0bf8dc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -2
README.md CHANGED
@@ -5,5 +5,42 @@ datasets:
5
  language:
6
  - ko
7
  ---
8
- - klue-bert base finetuned model
9
- - **downstream task** : korean curse detection (binary classification)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - ko
7
  ---
8
+ # K-urse_Detection_with_BERT
9
+
10
+ ![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge&logo=TensorFlow&logoColor=white) ![Keras](https://img.shields.io/badge/Keras-%23D00000.svg?style=for-the-badge&logo=Keras&logoColor=white)
11
+
12
+ ## Overview
13
+ **K-urse_Detection_with_BERT** : Korean Cursing expression Detection with fine-tuned klue_BERT
14
+
15
+ This is the KWU "text mining" output for the first semester of 2023.
16
+
17
+ See Project Overview Here! : [Notion(Korean)](https://www.notion.so/tolerblanc/4d70c776b3f74dbe8e03a38ccda27fbb?pvs=4)
18
+
19
+ See this model on GitHub : [Link](https://github.com/Tolerblanc/K-urse_Detection_with_BERT)
20
+
21
+ ## Evaluation
22
+ - Comparison Model is [here](https://github.com/JminJ/Bad_text_classifier)
23
+ - Used [2runo's Curse-detection-data](https://github.com/2runo/Curse-detection-data)
24
+
25
+ | Model/Metric | Accuracy | Precision | Recall | F1 Score |
26
+ | --- | --- | --- | --- | --- |
27
+ | Comparison(Electra base) | 0.81 | 0.69 | **0.87** | **0.77** |
28
+ | klue-BERT base(Our best result) | **0.83** | **0.76**** | 0.75 | 0.75 |
29
+
30
+ - Used Youtube Comments
31
+
32
+ | Model/Metric | Accuracy | Precision | Recall | F1 Score |
33
+ | --- | --- | --- | --- | --- |
34
+ | Comparison(Electra base) | 0.77 | 0.52 | **0.90** | 0.66 |
35
+ | klue-BERT base(Our best result) | **0.89** | **0.75** | 0.80 | **0.78** |
36
+
37
+ ## Demo with HuggingFace's Space 🤗
38
+ Try Demo Here! [Go to HuggingFace Space](https://huggingface.co/datasets/Tolerblanc/Demo_Kurse_detection)
39
+
40
+ ## Reference
41
+ - Smilegate-AI's Korean Unsmile Dataset : [Link](https://huggingface.co/datasets/smilegate-ai/kor_unsmile)
42
+ - JeanLee's K-MHaS Dataset and Paper : [Link](https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech)
43
+ - KLUE(Korean Language Understanding Evaluation) BERT : [Link](https://github.com/KLUE-benchmark/KLUE)
44
+ - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : [Link](https://arxiv.org/abs/1810.04805)
45
+ - 2runo's Curse Detection Dataset : [Link](https://github.com/2runo/Curse-detection-data)
46
+ - JminJ's Bad Text Classifier : [Link](https://github.com/JminJ/Bad_text_classifier)