huggingface-course/codeparrot-ds-train
Viewer • Updated • 384k • 587 • 9
How to use Marcoson320/codeparrot-gpt2-mi50 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Marcoson320/codeparrot-gpt2-mi50") # Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("Marcoson320/codeparrot-gpt2-mi50")
model = AutoModelForMultimodalLM.from_pretrained("Marcoson320/codeparrot-gpt2-mi50")How to use Marcoson320/codeparrot-gpt2-mi50 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Marcoson320/codeparrot-gpt2-mi50"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Marcoson320/codeparrot-gpt2-mi50",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Marcoson320/codeparrot-gpt2-mi50
How to use Marcoson320/codeparrot-gpt2-mi50 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Marcoson320/codeparrot-gpt2-mi50" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Marcoson320/codeparrot-gpt2-mi50",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Marcoson320/codeparrot-gpt2-mi50" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Marcoson320/codeparrot-gpt2-mi50",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Marcoson320/codeparrot-gpt2-mi50 with Docker Model Runner:
docker model run hf.co/Marcoson320/codeparrot-gpt2-mi50
GPT-2 small (124M 參數) 之 causal language model,依 HuggingFace LLM Course Chapter 7.6 所述方法訓練。權重自隨機初始化開始,於 codeparrot-ds 資料集上訓練 1 epoch,作為 from-scratch 訓練流程在 AMD MI50 平台之可重現紀錄。
| 項目 | 值 |
|---|---|
| 模型架構 | GPT-2 (n_layer=12, n_head=12, n_embd=768) |
| 參數量 | 124,242,432 |
| Tokenizer | huggingface-course/code-search-net-tokenizer (BPE, vocab=50,000) |
| Context length | 128 tokens |
| 訓練集 | huggingface-course/codeparrot-ds-train (16,702,061 length-128 chunks) |
| 驗證集 | huggingface-course/codeparrot-ds-valid |
| Optimizer | AdamW (β₁=0.9, β₂=0.999, weight_decay=0.1) |
| Learning rate | 5×10⁻⁴, cosine schedule, warmup 1,000 steps |
| Effective batch size | 256 (per_device_bs=64 × grad_accum=2 × world_size=2) |
| Precision | fp16 |
| 平行化 | DistributedDataParallel (DDP), NCCL/RCCL backend |
| 總步數 | 65,243 (1 epoch) |
| Eval / save 間隔 | 每 5,000 steps |
每 5,000 steps 取一個訓練 metrics 與 eval metrics 紀錄。
| step | epoch | learning_rate | train_loss | grad_norm | eval_loss |
|---|---|---|---|---|---|
| 5,000 | 0.077 | 4.952×10⁻⁴ | 2.677 | 0.180 | 1.752 |
| 10,000 | 0.153 | 4.762×10⁻⁴ | 1.685 | 0.152 | 1.520 |
| 15,000 | 0.230 | 4.437×10⁻⁴ | 1.529 | 0.153 | 1.415 |
| 20,000 | 0.307 | 3.996×10⁻⁴ | 1.447 | 0.145 | 1.347 |
| 25,000 | 0.383 | 3.467×10⁻⁴ | 1.386 | 0.154 | 1.295 |
| 30,000 | 0.460 | 2.880×10⁻⁴ | 1.334 | 0.160 | 1.247 |
| 35,000 | 0.537 | 2.271×10⁻⁴ | 1.288 | 0.160 | 1.204 |
| 40,000 | 0.613 | 1.675×10⁻⁴ | 1.241 | 0.170 | 1.160 |
| 45,000 | 0.690 | 1.128×10⁻⁴ | 1.200 | 0.175 | 1.123 |
| 50,000 | 0.766 | 6.631×10⁻⁵ | 1.162 | 0.174 | 1.090 |
| 55,000 | 0.843 | 3.072×10⁻⁵ | 1.135 | 0.180 | 1.066 |
| 60,000 | 0.920 | 8.175×10⁻⁶ | 1.113 | 0.191 | 1.054 |
| 65,000 | 0.996 | 1.78×10⁻⁸ | 1.106 | 0.180 | 1.051 |
訓練未進行更多 epoch 或超參數搜尋。後半段 cosine 衰減使 lr 趨近於零,gradient norm 維持在 0.15-0.19 區間,未出現發散或不穩定徵兆。
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="Marcoson320/codeparrot-gpt2-mi50",
device=0,
)
print(pipe("# scatter plot of x, y\n", max_new_tokens=64)[0]["generated_text"])
repetition_penalty>1.0 或 no_repeat_ngram_size 緩解。