Instructions to use jadechoi/wizl_base_32b-rfg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use jadechoi/wizl_base_32b-rfg with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct") model = PeftModel.from_pretrained(base_model, "jadechoi/wizl_base_32b-rfg") - Notebooks
- Google Colab
- Kaggle
See axolotl config
axolotl version: 0.10.0
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
load_in_8bit: false
load_in_4bit: false
datasets:
- path: train.jsonl
type: chat_template
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./outputs/out
# --- LoRA ์ค์ ์ถ๊ฐ (32B ํ์ต ์ฑ๊ณต์ ํต์ฌ) ---
adapter: lora
lora_r: 64 # 16 โ 64 (์ต์ 4๋ฐฐ)
lora_alpha: 128 # alpha = 2 ร rank ์ ์ง
lora_dropout: 0.05 # ์ ์ง
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
# ------------------------------------------
sequence_len: 8192
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
wandb_project: wizl-base-m
wandb_entity:
wandb_watch:
wandb_name: 32b-base-rfg
wandb_log_model:
hub_model_id: jadechoi/wizl_base_32b-rfg
push_to_hub: true
hub_private_repo: false # ์ด๊ฑธ false๋ก ํ๋ฉด public
# --- ์๋์ ์์ ์ฑ์ ๋ชจ๋ ์ก์ ๋ฐฐ์น ์ค์ ---
micro_batch_size: 4 # VRAM ์ฌ์ ์์ผ๋ 2 โ 4๋ก ์ฌ๋ ค์ ํ์ต ์๋ ํฅ์
gradient_accumulation_steps: 8 # batch size ์ฌ๋ ธ์ผ๋ 16 โ 8๋ก ์กฐ์
optimizer: paged_adamw_8bit
gradient_checkpointing: true # LoRA ํ๊ฒฝ์์๋ ๋ค์ true๋ก ์ค์ ํ์ฌ ์์ ์ฑ ํ๋ณด
# ------------------------------------------
num_epochs: 3
lr_scheduler: cosine
learning_rate: 2e-4 # LoRA๋ Full FT๋ณด๋ค ์กฐ๊ธ ๋ ๋์ ํ์ต๋ฅ ์ด ํจ๊ณผ์ ์
๋๋ค
bf16: true
fp16:
tf32: false
logging_steps: 1
flash_attention: true
eager_attention:
warmup_ratio: 0.05
evals_per_epoch: 0
saves_per_epoch: 1
weight_decay: 0.01
# LoRA ์ฌ์ฉ ์ FSDP๋ ํ์ ์๊ฑฐ๋ ์ต์ํํ ์ ์์ต๋๋ค.
# ์ฌ๊ธฐ์๋ ๊ฐ์ฅ ํธํ์ฑ ์ข์ ๊ธฐ๋ณธ DDP ๋ฐฉ์์ ์ฌ์ฉํ๊ฒ ๋ฉ๋๋ค.
fsdp:
fsdp_config:
wizl_base_32b-rfg
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the train.jsonl dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 33
- training_steps: 673
Training results
Framework versions
- PEFT 0.15.2
- Transformers 4.52.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.4
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for jadechoi/wizl_base_32b-rfg
Base model
Qwen/Qwen2.5-32B Finetuned
Qwen/Qwen2.5-Coder-32B Finetuned
Qwen/Qwen2.5-Coder-32B-Instruct