Korcen
korcen-ml์ ๊ธฐ์กด ํค์๋ ๊ธฐ๋ฐ์ korcen์ ์ฐํ๊ฐ ์ฝ๋ค๋ ๋จ์ ์ ๊ทน๋ณตํ๊ธฐ์ํด ๋ฅ๋ฌ๋์ ํตํด ์ ํ๋๋ฅผ ํ์ธต ๋ ์ฌ๋ฆฌ๋ ค๋ ํ๋ก์ ํธ์ ๋๋ค.
์ผ๋ถ ๋ชจ๋ธ๋ง ๊ณต๊ฐํ๊ณ ์์ผ๋ฉฐ ๋ชจ๋ธ ํ์ผ์ ์ฌ๊ธฐ์์ ํ์ธ์ด ๊ฐ๋ฅํฉ๋๋ค.
๋ ๋ง์ ๋ชจ๋ธ ํ์ผ๊ณผ ํ์ต ๋ฐ์ดํฐ๋ฅผ ๋ค์ด๋ฐ๊ณ ์ถ๋ค๋ฉด ๋ฌธ์์ฃผ์ธ์.
๋ฐ์ดํฐ ๋ฌธ์ฅ์ | |
---|---|
VDCNN(23.4.30) | 200,000๊ฐ |
VDCNN_KOGPT2(23.5.28) | 2,000,000๊ฐ |
VDCNN_LLAMA2(23.9.30) | 5,000,000๊ฐ |
VDCNN_LLAMA2_V2(24.1.29) | 10,000,000๊ฐ |
ํค์๋ ๊ธฐ๋ฐ ๊ธฐ์กด ๋ผ์ด๋ธ๋ฌ๋ฆฌ : py version, ts version
๋ชจ๋ธ ๊ฒ์ฆ
๋ฐ์ดํฐ๋ง๋ค ์์ค์ ๊ธฐ์ค์ด ๋ฌ๋ผ ์ค์ฐจ๊ฐ ์๋ค๋ ๊ฑธ ๊ฐ์ํ๊ณ ํ์ธํ์๊ธฐ ๋ฐ๋๋๋ค.
korean-malicious-comments-dataset | Curse-detection-data | kmhas_korean_hate_speech | Korean Extremist Website Womad Hate Speech Data | |
---|---|---|---|---|
korcen(v0.3.5) | 0.7121 | 0.8415 | 0.6800 | 0.6305 |
VDCNN(23.4.30) | 0.6900 | 0.4885 | 0.4885 | |
VDCNN_KOGPT2(23.6.15) | 0.7545 | 0.7824 | 0.7055 | |
VDCNN_LLAMA2(23.9.30) | 0.7762 | 0.8104 | 0.7296 | V2๋ก ๋์ฒด |
VDCNN_LLAMA2_V2(24.1.29) | 0.8322 | 0.8410 | 0.7837 | 0.7120 |
badword_check(23.10.1) | 0.5829 | 0.6761 | ||
CurseDetector(24.1.10) | 0.5679 | ์๊ฐ์์๋ก ํ ์คํธ ๋ธ๊ฐ | 0.5785 |
example
#py: 3.10, tf: 2.10
import tensorflow as tf
import numpy as np
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences
maxlen = 1000
model_path = 'vdcnn_model.h5'
tokenizer_path = "tokenizer.pickle"
model = tf.keras.models.load_model(model_path)
with open(tokenizer_path, "rb") as f:
tokenizer = pickle.load(f)
def preprocess_text(text):
text = text.lower()
return text
def predict_text(text):
sentence = preprocess_text(text)
encoded_sentence = tokenizer.encode_plus(sentence,
max_length=maxlen,
padding="max_length",
truncation=True)['input_ids']
sentence_seq = pad_sequences([encoded_sentence], maxlen=maxlen, truncating="post")
prediction = model.predict(sentence_seq)[0][0]
return prediction
while True:
text = input("Enter the sentence you want to test: ")
result = predict_text(text)
if result >= 0.5:
print("This sentence contains abusive language.")
else:
print("It's a normal sentence.")
Maker
Tanat
github: Tanat05
discord: Tanat05
email: tanat@tanat.kr
- Downloads last month
- 0
Unable to determine this model's library. Check the
docs
.