Edit model card

MPTK-1B

MPTK-1B๋Š” ํ•œ๊ตญ์–ด/์˜์–ด์ฝ”๋“œ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šต๋œ 1.3B ํŒŒ๋ผ๋ฏธํ„ฐ์˜ decoder-only transformer ์–ธ์–ด๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Model Details

Model Description

๋‹ค๋ฅธ decoder-only transformer์—์„œ ์ผ๋ถ€ ์ˆ˜์ •๋œ ์•„ํ‚คํ…์ฒ˜์ธ MPT๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Hyperparameter Value
n_parameters 1.3B
n_layers 24
n_heads 16
d_model 2048
vocab size 50432
sequence length 2048

Uses

How to Get Started with the Model

fp16์œผ๋กœ ์‹คํ–‰ ์‹œ NaN์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ fp32 ํ˜น์€ bf16๋กœ ์‹คํ–‰ํ•˜๊ธฐ๋ฅผ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("team-lucid/mptk-1b")
model = AutoModelForCausalLM.from_pretrained("team-lucid/mptk-1b")

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe(
            '๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ˆ˜๋„๋Š”',
            max_new_tokens=100,
            do_sample=True,
        )
    )

Training Details

Training Data

OSCAR, mC4, wikipedia, namuwiki ๋“ฑ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— RefinedWeb, The Stack ์—์„œ ์ผ๋ถ€๋ฅผ ์ถ”๊ฐ€ํ•ด ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Training Hyperparameters

Hyperparameter Value
Precision bfloat16
Optimizer Lion
Learning rate 2e-4
Batch size 1024
Downloads last month
4,687
Safetensors
Model size
1.31B params
Tensor type
F32
ยท