Qwen3 4B Thinking 2507 Heretic CodeFeedback — OpenCodeInstruct Learning LoRA

This repository contains an experimental LoRA adapter trained on top of:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

This adapter was trained as an additional OpenCodeInstruct continuation experiment.

It is not recommended as the main agentic coding version.

The main observation from testing is that this adapter appears to be more useful for learning, explanation, reasoning through code problems, and understanding programming tasks, but it became worse at strict “return only executable code” benchmark tasks.

Intended purpose

This LoRA is kept and published as an experimental branch for:

  • code explanation
  • learning-oriented coding assistance
  • understanding programming problems
  • step-by-step reasoning around code
  • comparing OpenCodeInstruct-style behavior against a stricter code-output model

It is not ideal for:

  • agentic coding
  • test-driven code generation
  • benchmark-style exact function output
  • tools that require the model to return only executable code
  • coding agents that must avoid prose/explanation unless asked

Why this is not the main version

A small local before/after Python code benchmark showed that this OpenCodeInstruct continuation reduced exact-code benchmark performance.

Model Adapter Passed Pass rate Avg tokens/s
Before heretic_F_lora_python5000_codefeedback5000 9/10 90.00% 8.38
After SAFE_OPENCODE_5000_1024_20260607_153327 6/10 60.00% 8.41

Delta:

Metric Value
Passes -3
Pass rate -30.00%
Avg tokens/s +0.03

The post-training adapter was worse on strict executable-code tasks, especially when the expected output was a compact Python function or class.

However, this does not mean the adapter is useless. It likely shifted behavior toward a more explanatory, learning-oriented style. That can be useful for users who want to understand code, reason through tasks, or receive more guided programming explanations.

Training configuration

Item Value
Base model JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback
Adapter input heretic_F_lora_python5000_codefeedback5000
Dataset nvidia/OpenCodeInstruct
Samples used 5,000
Sequence length 1024
Epochs 1
Learning rate 5e-6
Training method QLoRA / LoRA
Quantized loading during training 4-bit NF4

Training result

Metric Value
Train runtime 6258 seconds
Runtime 1h 44m 18s
Samples/second 0.799
Steps/second 0.1
Final train loss 0.3913
First logged loss 0.6957
Last logged loss 0.3623
Minimum logged loss 0.3441

The training run completed successfully after reducing sequence length and using a more conservative GPU configuration.

Benchmark files

The local benchmark artifacts are included in this repository under:

benchmark/

Files:

benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/comparison.json
benchmark/before_results.jsonl
benchmark/after_results.jsonl

Recommended usage

Use this adapter when you want a model that may be more comfortable explaining code and reasoning through programming tasks.

For stricter agentic coding or benchmark-style executable output, prefer the original merged CodeFeedback model:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Loading example

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

Important notes

This is an experimental LoRA adapter.

It should not be treated as a universal improvement over the previous CodeFeedback model. It is published for transparency, comparison, and reproducibility.

The benchmark results suggest that it is worse for strict agentic coding, but potentially useful for learning-oriented coding assistance.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA

Dataset used to train JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA