Built with Axolotl

See axolotl config

axolotl version: 0.13.2

adapter: lora
base_model: Qwen/Qwen3-8B
bf16: true
bnb_4bit_compute_dtype: bfloat16
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
dataset_prepared_path: out/prepared_dataset_stateless
  message_field_content: content
  message_field_role: role
  path: /e/project1/reformo/salgarkar1/agents_learn/pythonformer-workshop/paired/train/out/paired_data/stateless/rule_diagnosis/traces.jsonl
  roles_to_train:
  - assistant
  type: chat_template
eval_steps: 5
flash_attention: true
gradient_accumulation_steps: 16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
learning_rate: 0.0001
load_in_4bit: true
load_in_8bit: false
logging_steps: 1
lora_alpha: 128
lora_dropout: 0.05
lora_r: 64
lora_target_linear: false
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 3.0
optimizer: adamw_torch
output_dir: out/qwen3-8b-stateless-rule_diagnosis-20260525_123626
pad_to_sequence_len: true
sample_packing: false
save_strategy: epoch
save_total_limit: 3
seed: 3407
sequence_len: 16384
strict: false
tf32: true
tokenizer_type: AutoTokenizer
trust_remote_code: true
val_set_size: 0.04
wandb_log_model: null
wandb_project: pythonformer
wandb_watch: null
warmup_ratio: 0.03
weight_decay: 0.01

out/qwen3-8b-stateless-rule_diagnosis-20260525_123626

This model is a fine-tuned version of Qwen/Qwen3-8B on the /e/project1/reformo/salgarkar1/agents_learn/pythonformer-workshop/paired/train/out/paired_data/stateless/rule_diagnosis/traces.jsonl dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1784
  • Ppl: 1.1952
  • Memory/max Active (gib): 54.54
  • Memory/max Allocated (gib): 54.54
  • Memory/device Reserved (gib): 66.97

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 3407
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 2
  • training_steps: 45

Training results

Training Loss Epoch Step Validation Loss Ppl Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 0.4882 1.6293 53.19 53.19 56.52
0.3402 0.3333 5 0.3146 1.3697 54.54 54.54 66.97
0.2775 0.6667 10 0.2673 1.3064 54.54 54.54 66.97
0.2385 1.0 15 0.2318 1.2609 54.54 54.54 66.97
0.2271 1.3333 20 0.2099 1.2335 54.54 54.54 66.97
0.2024 1.6667 25 0.1946 1.2148 54.54 54.54 66.97
0.1813 2.0 30 0.1855 1.2039 54.54 54.54 66.97
0.1706 2.3333 35 0.1808 1.1981 54.54 54.54 66.97
0.185 2.6667 40 0.1787 1.1957 54.54 54.54 66.97
0.1725 3.0 45 0.1784 1.1952 54.54 54.54 66.97

Framework versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Pytorch 2.10.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2
Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AutomatedScientist/qwen3-8b-stateless-rule_diagnosis-lora

Finetuned
Qwen/Qwen3-8B
Adapter
(1448)
this model

Collection including AutomatedScientist/qwen3-8b-stateless-rule_diagnosis-lora