|
# RoBERTa-base Korean |
|
|
|
## λͺ¨λΈ μ€λͺ
|
|
μ΄ RoBERTa λͺ¨λΈμ λ€μν νκ΅μ΄ ν
μ€νΈ λ°μ΄ν°μ
μμ **μμ ** λ¨μλ‘ μ¬μ νμ΅λμμ΅λλ€. |
|
μ체 ꡬμΆν νκ΅μ΄ μμ λ¨μ vocabμ μ¬μ©νμμ΅λλ€. |
|
|
|
## μν€ν
μ² |
|
- **λͺ¨λΈ μ ν**: RoBERTa |
|
- **μν€ν
μ²**: RobertaForMaskedLM |
|
- **λͺ¨λΈ ν¬κΈ°**: 256 hidden size, 8 hidden layers, 8 attention heads |
|
- **max_position_embeddings**: 514 |
|
- **intermediate_size**: 2048 |
|
- **vocab_size**: 1428 |
|
|
|
## νμ΅ λ°μ΄ν° |
|
μ¬μ©λ λ°μ΄ν°μ
μ λ€μκ³Ό κ°μ΅λλ€: |
|
- **λͺ¨λμλ§λμΉ**: μ±ν
, κ²μν, μΌμλν, λ΄μ€, λ°©μ‘λλ³Έ, μ±
λ± |
|
- **AIHUB**: SNS, μ νλΈ λκΈ, λμ λ¬Έμ₯ |
|
- **κΈ°ν**: λ무μν€, νκ΅μ΄ μν€νΌλμ |
|
μ΄ ν©μ°λ λ°μ΄ν°λ μ½ 11GB μ
λλ€. |
|
|
|
## νμ΅ μμΈ |
|
- **BATCH_SIZE**: 112 (GPUλΉ) |
|
- **ACCUMULATE**: 36 |
|
- **MAX_STEPS**: 12,500 |
|
- **Train Steps*Batch Szie**: **100M** |
|
- **WARMUP_STEPS**: 2,400 |
|
- **μ΅μ ν**: AdamW, LR 1e-3, BETA (0.9, 0.98), eps 1e-6 |
|
- **νμ΅λ₯ κ°μ **: linear |
|
- **μ¬μ©λ νλμ¨μ΄**: 2x RTX 8000 GPU |
|
|
|
|
|
![Evaluation Loss Graph](https://cdn-uploads.huggingface.co/production/uploads/64a0fd6fd3149e05bc5260dd/-64jKdcJAavwgUREwaywe.png) |
|
|
|
![Evaluation Accuracy Graph](https://cdn-uploads.huggingface.co/production/uploads/64a0fd6fd3149e05bc5260dd/LPq5M6S8LTwkFSCepD33S.png) |
|
|
|
## μ¬μ© λ°©λ² |
|
```python |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
# λͺ¨λΈκ³Ό ν ν¬λμ΄μ λΆλ¬μ€κΈ° |
|
model = AutoModel.from_pretrained("your_model_name") |
|
tokenizer = AutoTokenizer.from_pretrained("your_tokenizer_name") |
|
|
|
# ν
μ€νΈλ₯Ό ν ν°μΌλ‘ λ³ννκ³ μμΈ‘ μν |
|
inputs = tokenizer("μ¬κΈ°μ νκ΅μ΄ ν
μ€νΈ μ
λ ₯", return_tensors="pt") |
|
outputs = model(**inputs) |
|
|