Model Card

Model Description

이 모델은 classla/xlm-roberta-base-multilingual-text-genre-classifier 모델을 기반으로 한 텍스트 분류 파이프라인입니다.

기반 모델: XLM-RoBERTa (xlm-roberta-base)
목적: 텍스트의 장르(genre) 자동 분류
유형: 다국어 텍스트 분류 (text classification)

Intended Use

이 파이프라인은 다양한 언어의 텍스트를 입력받아 미리 정의된 장르 중 하나로 분류하는 데 사용됩니다. 예를 들어, 뉴스 기사, 의견, 정보성 글 등을 구분할 수 있습니다.

How to Use (이 노트북에서의 사용 예시)

이 파이프라인은 transformers 라이브러리의 pipeline 기능을 사용하여 생성되었습니다. 아래와 같이 텍스트 리스트를 입력하여 각 텍스트의 장르를 예측할 수 있습니다.

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

# 모델 및 토크나이저 로드
model_name = "classla/xlm-roberta-base-multilingual-text-genre-classifier"
model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, use_fast=False)

# 파이프라인 생성
pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer)

# 예측 예시
text = [
    "나는 의공학과 학생으로, 4학년 1학기를 진행중이다.",
    "gemini와 chatgpt의 다른 점은 encoder 기반이냐 decoder 기반이냐이다.",
]

output = pipeline(text)
print(output)

예측 결과 예시: [{'label': 'Opinion/Argumentation', 'score': 0.5527392029762268}, {'label': 'Information/Explanation', 'score': 0.9898243546485901}]

Training Data & Evaluation (선택 사항 - 원본 모델 정보)

(이 부분은 원본 모델 classla/xlm-roberta-base-multilingual-text-genre-classifier의 학습 데이터 및 평가 지표를 요약하여 작성할 수 있습니다. Hugging Face 모델 페이지를 참조하세요.)

Limitations and Bias

(모델의 한계, 편향성, 주의할 점 등을 작성합니다. 예를 들어, 특정 언어에 대한 성능 차이, 훈련 데이터의 편향으로 인한 문제점 등)

License

(모델의 라이선스 정보를 작성합니다. 예: Apache 2.0, MIT 등)

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support