Edit model card

Korcen

131_20220604170616

korcen-ml์€ ๊ธฐ์กด ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜์˜ korcen์˜ ์šฐํšŒ๊ฐ€ ์‰ฝ๋‹ค๋Š” ๋‹จ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ์œ„ํ•ด ๋”ฅ๋Ÿฌ๋‹์„ ํ†ตํ•ด ์ •ํ™•๋„๋ฅผ ํ•œ์ธต ๋” ์˜ฌ๋ฆฌ๋ ค๋Š” ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค.

์ผ๋ถ€ ๋ชจ๋ธ๋งŒ ๊ณต๊ฐœํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ๋ชจ๋ธ ํŒŒ์ผ์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋” ๋งŽ์€ ๋ชจ๋ธ ํŒŒ์ผ๊ณผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋ฐ›๊ณ  ์‹ถ๋‹ค๋ฉด ๋ฌธ์˜์ฃผ์„ธ์š”.

๋ฐ์ดํ„ฐ ๋ฌธ์žฅ์ˆ˜
VDCNN(23.4.30) 200,000๊ฐœ
VDCNN_KOGPT2(23.5.28) 2,000,000๊ฐœ
VDCNN_LLAMA2(23.9.30) 5,000,000๊ฐœ
VDCNN_LLAMA2_V2(24.1.29) 10,000,000๊ฐœ

ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ธฐ์กด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ : py version, ts version

์„œํฌํŠธ ๋””์Šค์ฝ”๋“œ ์„œ๋ฒ„

๋ชจ๋ธ ๊ฒ€์ฆ

๋ฐ์ดํ„ฐ๋งˆ๋‹ค ์š•์„ค์˜ ๊ธฐ์ค€์ด ๋‹ฌ๋ผ ์˜ค์ฐจ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฑธ ๊ฐ์•ˆํ•˜๊ณ  ํ™•์ธํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

korean-malicious-comments-dataset Curse-detection-data kmhas_korean_hate_speech Korean Extremist Website Womad Hate Speech Data
korcen(v0.3.5) 0.7121 0.8415 0.6800 0.6305
VDCNN(23.4.30) 0.6900 0.4885 0.4885
VDCNN_KOGPT2(23.6.15) 0.7545 0.7824 0.7055
VDCNN_LLAMA2(23.9.30) 0.7762 0.8104 0.7296 V2๋กœ ๋Œ€์ฒด
VDCNN_LLAMA2_V2(24.1.29) 0.8322 0.8410 0.7837 0.7120
badword_check(23.10.1) 0.5829 0.6761
CurseDetector(24.1.10) 0.5679 ์‹œ๊ฐ„์†Œ์š”๋กœ ํ…Œ์ŠคํŠธ ๋ธ”๊ฐ€ 0.5785

example

#py: 3.10, tf: 2.10
import tensorflow as tf
import numpy as np
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences

maxlen = 1000

model_path = 'vdcnn_model.h5'
tokenizer_path = "tokenizer.pickle"

model = tf.keras.models.load_model(model_path)
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)

def preprocess_text(text):
    text = text.lower()
    
    return text

def predict_text(text):
    sentence = preprocess_text(text)
    encoded_sentence = tokenizer.encode_plus(sentence,
                                             max_length=maxlen,
                                             padding="max_length",
                                             truncation=True)['input_ids']
    sentence_seq = pad_sequences([encoded_sentence], maxlen=maxlen, truncating="post")
    prediction = model.predict(sentence_seq)[0][0]
    return prediction
    
while True:
    text = input("Enter the sentence you want to test: ")
    result = predict_text(text)
    if result >= 0.5:
        print("This sentence contains abusive language.")
    else:
        print("It's a normal sentence.")

Maker

Tanat

github:   Tanat05
discord:  Tanat05
email:    tanat@tanat.kr
Downloads last month
0
Unable to determine this model's library. Check the docs .