Scenesmith — Qwen3-4B LoRA Adapter (Manim CE Animation Generator)

A LoRA adapter that turns Qwen3-4B-Instruct-2507 into Scenesmith: give it a natural-language description of a concept ("animate binary search narrowing on a sorted array") and it replies with one complete, runnable Manim Community Edition Python file in a consistent dark house style — no markdown fences, no commentary, no LaTeX dependency. Trained and evaluated entirely on a 16GB Apple Silicon machine via MLX.

The defining property of the project: outputs are mechanically verifiable. Manim code either renders to an MP4 or it doesn't, so a single render gate (manim -ql or reject) guards the training data, the harvested data, and the eval. Every example this adapter was trained on actually rendered.

Code, data pipeline, and eval harness: https://github.com/albertobarnabo/scenesmith

Results

43 eval prompts (31 held-out in-distribution, 12 novel concepts), greedy decoding. "Render pass" means the generated file produced a real MP4 through the same gate as the training data.

metric	base Qwen3-4B	+ this adapter
render pass, overall	6/43 (14%)	27/43 (63%)
render pass, held-out in-distribution	3/31 (10%)	25/31 (81%)
render pass, novel concepts	3/12 (25%)	2/12 (17%)
house style adherence (palette/bg/font/subtitles)	0/43	39–43/43
markdown fences around output	43/43	0/43

Usage

pip install mlx-lm
python -m mlx_lm generate \
  --model mlx-community/Qwen3-4B-Instruct-2507-4bit \
  --adapter-path <this repo> \
  --system-prompt "$(cat system.txt)" \
  --max-tokens 3072 \
  --prompt "Animate two pointers finding a pair that sums to 20 in [2, 5, 8, 11, 14, 19]"

Save the reply to scene.py and render with manim -ql scene.py.

Two integration notes:

Strip the think prefix. The Qwen3 chat template wraps assistant turns in an (empty) <think>\n\n</think> block and the adapter reproduces it. Remove it before compiling: re.sub(r"<think>.*?</think>\s*", "", out, flags=re.S).
Use the bundled system.txt — it is the system prompt the adapter was trained against; the house style is conditioned on it.

House style

Dark slate background (#0e1116), GitHub-dark accent palette declared as constants, Menlo, Pango Text only (no LaTeX required on the render machine), subtitles via add_subcaption (Manim emits an .srt), one descriptive Scene class per file, fade-out ending.

Training

Base: mlx-community/Qwen3-4B-Instruct-2507-4bit, LoRA rank 16 on 16 layers (14.7M trainable params, 0.365%), mask_prompt: true, 650 iterations, effective batch 4, cosine decay 5e-5 → 5e-6. Peak memory 5.5GB.
Data: 851 train / 74 valid examples, all render-verified.
- 60% synthetic: 12 parameterized algorithm/math scene families in the house style (binary search, two pointers, sliding window, bubble sort, stack bracket-matching, BFS grid, linked-list reversal, hash-map two-sum, prefix sums, Kadane, decimal→binary, function plots).
- 40% wild, render-gate-filtered: bespokelabs/bespoke-manim, the official ManimCE docs examples (back-captioned), and SuienR/ManimBench-v1. Wild examples were trained under a separate plain system prompt so the house prompt stays bound to house-style outputs.
Loss: val 1.117 → ~0.22 (loss.csv in this repo).

Limitations

Novel concept types outside the trained families can hallucinate plausible-but- fake API (Text.add_cell) — always render-check generated code (it's cheap).
No Tex/MathTex: the house style is deliberately LaTeX-free.
Tuned for ~10–25 s explainer scenes at 480p/1080p, not long-form videos.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for albertobarnabo/scenesmith-qwen3-4b

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

mlx-community/Qwen3-4B-Instruct-2507-4bit

Adapter

(5)

this model