Qwen3 4B Thinking 2507 Heretic CodeFeedback

This is a merged code-focused fine-tune based on:

JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic

The model was trained with QLoRA/LoRA on Python and code instruction datasets, then merged back into the base model.

This repository contains the full merged safetensors model, not only a LoRA adapter.

Base model

Item Value
Base model JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic
Architecture family Qwen3
Parameter count 4B
Format Hugging Face Transformers / safetensors
Tensor type F16
Fine-tuning method QLoRA / LoRA
Final state Merged model

Training datasets

Dataset Samples used Notes
iamtarun/python_code_instructions_18k_alpaca 5,000 Python instruction/code examples
m-a-p/CodeFeedback-Filtered-Instruction 5,000 Code instruction and feedback examples

A SWE-smith trajectory experiment was tested separately, but it was not used in this final merged version.

LoRA configuration

Parameter Value
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Sequence length 2048
Epochs per stage 1
Quantized loading 4-bit NF4
Trainable parameters ~33M
Trainable percentage ~0.81%

Target modules:

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj

Training stages

Stage Input adapter Dataset Output adapter
1 Base model Python instructions 5k heretic_F_lora_python_5000
2 heretic_F_lora_python_5000 CodeFeedback 5k heretic_F_lora_python5000_codefeedback5000
Final Base model + final adapter Merge Full safetensors model

Training environment

Component Version
Python 3.11
PyTorch 2.11.0+cu128
CUDA 12.8
Transformers 5.10.2
Datasets 5.0.0
Accelerate 1.13.0
PEFT 0.19.1
bitsandbytes 0.49.2
sentencepiece 0.2.1
tiktoken 0.13.0
protobuf 7.35.0
pandas 3.0.3
pyarrow 24.0.0

Training GPU:

  • NVIDIA GeForce RTX 3080 Ti 12 GB

Intended use

This model is intended for local experimentation with:

  • Python code generation
  • code explanation
  • simple debugging
  • instruction-following tests
  • downstream conversion to GGUF, AWQ, GPTQ, or OpenVINO formats

Notes

This is an experimental model. It may produce incorrect code, unsafe suggestions, or hallucinated explanations. Outputs should be reviewed before use in production or security-sensitive environments.

Hardware compatibility estimate

This table is an approximate guide for the current merged F16 safetensors version.

Hardware / VRAM Status Notes
6 GB VRAM 🔴 Unlikely F16 weights are too large without heavy offload or quantization.
8 GB VRAM 🔴 Very tight May fail or require CPU offload. Use GGUF/AWQ/INT4 instead.
10 GB VRAM 🟡 Possible May run with low context and careful memory settings.
12 GB VRAM 🟢 Likely Tested training/inference workflow on RTX 3080 Ti 12 GB with 4-bit loading.
16 GB VRAM 🟢 Good Comfortable for normal local inference.
24 GB VRAM 🟢 Very good Recommended for larger context, conversion, quantization, and experiments.
32 GB+ RAM CPU-only 🟡 Possible Slow. Better with GGUF quantized versions.

Quantized versions

Planned/recommended export formats:

Format Status Expected use
F16 safetensors 🟢 Current Full merged model, best source for conversion.
AWQ 4-bit 🟡 Planned Better for GPU/server inference, mainly CUDA/Linux or compatible runtimes.
OpenVINO INT4 / AWQ-style compression 🟢 Planned for Intel Arc Recommended path for Intel Arc/OpenVINO.
GGUF Q5_K_M / Q6_K / Q8_0 🟡 Planned Recommended for LM Studio, llama.cpp, Ollama, CPU/GPU mixed inference.

Practical recommendation

For this repository, use the current F16 safetensors model as the master model.

For actual local use:

  • RTX 3080 Ti 12 GB or better: F16 may work, but quantized versions are preferred.
  • RTX 3090 24 GB: F16 and quantization workflows are much more comfortable.
  • Intel Arc: convert this model to OpenVINO INT4 instead of using CUDA-focused AWQ.
  • Low VRAM systems: wait for GGUF or INT4 builds.
Downloads last month
20
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Datasets used to train JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback