Qwen3-4B CoT Compression — Level 3 (Variable chain)

LoRA adapter for Qwen/Qwen3-4B-Instruct-2507 that produces reasoning in Level-3 style: ~100 chars; e.g. A=48;M=A/2;T=A+M.

Part of a 5-level Pareto study on chain-of-thought compression for math reasoning. See the collection and the eval artifacts at ssurface/qwen3-4b-cot-compress-eval.

GSM8K-test results (single seed = 42)

metric value
accuracy 0.823351
n_correct / n_total 1086 / 1319
mean think tokens 70.99
median think tokens 65.00

Baseline (stock Qwen/Qwen3-4B-Instruct-2507) is ~0.896 accuracy at ~294 completion tokens.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507", torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "ssurface/qwen3-4b-cot-compress-l3")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

prompt = tok.apply_chat_template(
    [{"role": "user",
       "content": "Solve this using Level 3 (Variable chain).\n"
                  "Problem: Natalia sold clips to 48 friends in April, "
                  "then half as many in May. How many in total?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=256,
    eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"),
)
print(tok.decode(out[0], skip_special_tokens=False))

Training

  • Base: Qwen/Qwen3-4B-Instruct-2507, 4-bit NF4 via bitsandbytes
  • Adapter: LoRA (PEFT) via TRL SFTTrainer
  • Hardware: 4× RTX 2080 Ti, fp16 (Turing — no bf16, no FlashAttention 2)
  • Data: GSM8K train, filtered to rows whose teacher-generated Level-3 reasoning matches the GSM8K ground-truth answer.
  • Loss: full-sequence (no completion-only masking in this TRL version).

Caveats

  • Single seed (42). Error bars not yet computed.
  • Levels 4 and 5 trade accuracy for token count; Level 2 is the accuracy-preserving sweet spot. If you don't need extreme compression, prefer L1 or L2.

Citation

Paper draft in progress.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ssurface/qwen3-4b-cot-compress-l3

Adapter
(5499)
this model

Dataset used to train ssurface/qwen3-4b-cot-compress-l3

Collection including ssurface/qwen3-4b-cot-compress-l3