Qwen3 4B Thinking 2507 Heretic CodeFeedback â€” OpenCodeInstruct Learning LoRA

This repository contains an experimental LoRA adapter trained on top of:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

This adapter was trained as an additional OpenCodeInstruct continuation experiment.

It is not recommended as the main agentic coding version.

The main observation from testing is that this adapter appears to be more useful for learning, explanation, reasoning through code problems, and understanding programming tasks, but it became worse at strict â€œreturn only executable codeâ€ benchmark tasks.

Intended purpose

This LoRA is kept and published as an experimental branch for:

code explanation
learning-oriented coding assistance
understanding programming problems
step-by-step reasoning around code
comparing OpenCodeInstruct-style behavior against a stricter code-output model

It is not ideal for:

agentic coding
test-driven code generation
benchmark-style exact function output
tools that require the model to return only executable code
coding agents that must avoid prose/explanation unless asked

Why this is not the main version

A small local before/after Python code benchmark showed that this OpenCodeInstruct continuation reduced exact-code benchmark performance.

Model	Adapter	Passed	Pass rate	Avg tokens/s
Before	`heretic_F_lora_python5000_codefeedback5000`	9/10	90.00%	8.38
After	`SAFE_OPENCODE_5000_1024_20260607_153327`	6/10	60.00%	8.41

Delta:

Metric	Value
Passes	-3
Pass rate	-30.00%
Avg tokens/s	+0.03

The post-training adapter was worse on strict executable-code tasks, especially when the expected output was a compact Python function or class.

However, this does not mean the adapter is useless. It likely shifted behavior toward a more explanatory, learning-oriented style. That can be useful for users who want to understand code, reason through tasks, or receive more guided programming explanations.

Training configuration

Item	Value
Base model	`JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback`
Adapter input	`heretic_F_lora_python5000_codefeedback5000`
Dataset	`nvidia/OpenCodeInstruct`
Samples used	5,000
Sequence length	1024
Epochs	1
Learning rate	5e-6
Training method	QLoRA / LoRA
Quantized loading during training	4-bit NF4

Training result

Metric	Value
Train runtime	6258 seconds
Runtime	1h 44m 18s
Samples/second	0.799
Steps/second	0.1
Final train loss	0.3913
First logged loss	0.6957
Last logged loss	0.3623
Minimum logged loss	0.3441

The training run completed successfully after reducing sequence length and using a more conservative GPU configuration.

Benchmark files

The local benchmark artifacts are included in this repository under:

benchmark/

Files:

benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/comparison.json
benchmark/before_results.jsonl
benchmark/after_results.jsonl

Recommended usage

Use this adapter when you want a model that may be more comfortable explaining code and reasoning through programming tasks.

For stricter agentic coding or benchmark-style executable output, prefer the original merged CodeFeedback model:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Loading example

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

Important notes

This is an experimental LoRA adapter.

It should not be treated as a universal improvement over the previous CodeFeedback model. It is published for transparency, comparison, and reproducibility.

The benchmark results suggest that it is worse for strict agentic coding, but potentially useful for learning-oriented coding assistance.

Downloads last month: 14

Model tree for JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

unsloth/Qwen3-4B-Thinking-2507

Finetuned

JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic

Finetuned

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Adapter

(2)

this model

JoaoZaokk
/

Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA

Qwen3 4B Thinking 2507 Heretic CodeFeedback â€” OpenCodeInstruct Learning LoRA

Intended purpose

Why this is not the main version

Training configuration

Training result

Benchmark files

Recommended usage

Loading example

Important notes

Model tree for JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA

Dataset used to train JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA