LDKSolutions/Ko-Wiki-ChatGPT-Detector-v1

ChatGPT 정보성 (Wikipedia) 글 탐지기

Description

Model that detects if a Korean wikipedia-style text is written by ChatGPT.

KoBigBird model fine-tuned with approximately 30,000 data of human written wikipedia summaries and ChatGPT generated wikipedia summaries.

Our classifier based on this checkpoint reached a validation accuracy of 99.2% after 10 epochs of training.

Example Usage

from transformers import *
import torch
import torch.nn as nn

model = AutoModelForSequenceClassification.from_pretrained("LDKSolutions/Ko-Wiki-ChatGPT-Detector-v1", num_labels=2) 
model.eval() 
tokenizer = AutoTokenizer.from_pretrained("LDKSolutions/Ko-Wiki-ChatGPT-Detector-v1")

# this is text generated by ChatGPT (GPT 3.5)
text = '''스마트 컨트랙트는 블록체인 기술에서 사용되는 프로그램 코드의 일종으로, 계약 조건을 자동으로 검증하고 실행하는 프로그램입니다. 스마트 컨트랙트는 블록체인 네트워크 상에서 실행되며, 블록체인에서 발생하는 모든 트랜잭션은 스마트 컨트랙트를 통해 처리됩니다.

스마트 컨트랙트는 조건과 실행 코드로 이루어져 있습니다. 예를 들어, A가 B에게 1,000달러를 지불해야하는 계약이 있다면, 이 계약의 조건을 스마트 컨트랙트로 작성하여 계약이 자동으로 실행되도록 할 수 있습니다. 이러한 스마트 컨트랙트는 자동으로 조건을 검증하고, 지정된 조건이 충족되었을 때 계약의 실행 코드를 실행하여 계약을 이행합니다. 이를 통해 계약 당사자는 서로를 신뢰하지 않아도 안전하게 거래를 진행할 수 있습니다.

스마트 컨트랙트는 블록체인에서 실행되기 때문에 모든 거래 내역이 투명하게 기록되며, 중개인이나 중앙 기관의 개입이 없기 때문에 거래 비용이 줄어듭니다. 또한 스마트 컨트랙트는 코드로 작성되기 때문에 자동화가 가능하며, 프로그램에 따른 비즈니스 로직을 실행하는 것이 가능합니다. 따라서 스마트 컨트랙트는 블록체인 기술의 핵심 기능 중 하나로, 분산형 애플리케이션(DApp) 개발 등에 활용되고 있습니다.'''

encoded_inputs = tokenizer(text, max_length=512, truncation=True, padding="max_length", return_tensors="pt") 

with torch.no_grad(): 
    outputs = model(**encoded_inputs).logits 
    probability = nn.Softmax()(outputs) 
    predicted_class = torch.argmax(probability).item() 
    if predicted_class == 1: 
        print("ChatGPT가 생성했을 확률이 높습니다!") 
    else: 
        print("인간이 작성했을 확률이 높습니다!")