Instructions to use clarkkitchen22/qwen3-4b-risk-sft-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use clarkkitchen22/qwen3-4b-risk-sft-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "clarkkitchen22/qwen3-4b-risk-sft-lora") - Notebooks
- Google Colab
- Kaggle
Risk RL Lab Qwen3 4B SFT LoRA Adapter
This repository contains a PEFT LoRA/QLoRA adapter trained for action selection in a deterministic Python Risk-compatible environment. The adapter is intended to emit one strict JSON object that selects from the prompt's candidate actions.
Output Contract
The model-facing contract is exactly:
{"action_index": 0}
action_index refers to the compact prompt-action list, not directly to a raw environment action. Runtime code maps that index back to the full legal action and the environment validates it. Invalid JSON, missing action_index, out-of-range indices, and illegal actions must be logged and replaced by FallbackSafeAgent.
Training Summary
- Base model:
Qwen/Qwen3-4B-Instruct-2507 - Method: Unsloth 4-bit LoRA/QLoRA SFT, adapter-only save
- Training rows: 49,000
- Held-out validation rows: 1,000
- Epochs: 1.0
- Sequence length: 2048
- LoRA rank/alpha: 16 / 16
- Train loss: 0.1535
- Held-out eval loss: 0.1226
- Hardware: NVIDIA RTX A5000
Training command:
HF_HOME=/workspace/.hf_home python -m training.train_sft_unsloth --dataset data/sft/risk_sft_stratified_50k.jsonl --model Qwen/Qwen3-4B-Instruct-2507 --out models/adapters/qwen3_4b_risk_sft --max-steps -1 --num-train-epochs 1 --limit-rows 0 --validation-split 0.02 --split-seed 3407 --eval-steps 1000 --logging-steps 50
Benchmark Results
| Evaluation | Rows | Strict JSON | Valid Index | Teacher Match | Invalid |
|---|---|---|---|---|---|
| Base model fixed prompt set | 100 | 0.000 | 0.000 | 0.000 | 100 |
| Adapter fixed prompt set | 100 | 0.850 | 0.850 | 0.820 | 15 |
| Adapter held-out validation set | 1000 | 0.779 | 0.779 | 0.722 | 221 |
Evaluation command:
HF_HOME=/workspace/.hf_home python -m training.benchmark_policy --dataset data/sft/risk_sft_stratified_50k_val.jsonl --model models/adapters/qwen3_4b_risk_sft --out data/prefs/benchmark_heldout.json --limit 1000 --seed 3407
Linked Artifacts
- Training dataset: https://huggingface.co/datasets/clarkkitchen22/risk-rl-lab-sft
- Benchmark data and metrics: https://huggingface.co/datasets/clarkkitchen22/risk-rl-lab-benchmark
Limitations
This is a research adapter for a simplified Risk-compatible environment. It is not a standalone base model and it is not a guarantee of optimal play. The environment should remain authoritative and validate every selected action.
- Downloads last month
- 29
Model tree for clarkkitchen22/qwen3-4b-risk-sft-lora
Base model
Qwen/Qwen3-4B-Instruct-2507