Scenesmith โ€” Qwen3-4B LoRA Adapter (Manim CE Animation Generator)

A LoRA adapter that turns Qwen3-4B-Instruct-2507 into Scenesmith: give it a natural-language description of a concept ("animate binary search narrowing on a sorted array") and it replies with one complete, runnable Manim Community Edition Python file in a consistent dark house style โ€” no markdown fences, no commentary, no LaTeX dependency. Trained and evaluated entirely on a 16GB Apple Silicon machine via MLX.

The defining property of the project: outputs are mechanically verifiable. Manim code either renders to an MP4 or it doesn't, so a single render gate (manim -ql or reject) guards the training data, the harvested data, and the eval. Every example this adapter was trained on actually rendered.

Code, data pipeline, and eval harness: https://github.com/albertobarnabo/scenesmith

Results

43 eval prompts (31 held-out in-distribution, 12 novel concepts), greedy decoding. "Render pass" means the generated file produced a real MP4 through the same gate as the training data.

metric base Qwen3-4B + this adapter
render pass, overall 6/43 (14%) 27/43 (63%)
render pass, held-out in-distribution 3/31 (10%) 25/31 (81%)
render pass, novel concepts 3/12 (25%) 2/12 (17%)
house style adherence (palette/bg/font/subtitles) 0/43 39โ€“43/43
markdown fences around output 43/43 0/43

training loss

Usage

pip install mlx-lm
python -m mlx_lm generate \
  --model mlx-community/Qwen3-4B-Instruct-2507-4bit \
  --adapter-path <this repo> \
  --system-prompt "$(cat system.txt)" \
  --max-tokens 3072 \
  --prompt "Animate two pointers finding a pair that sums to 20 in [2, 5, 8, 11, 14, 19]"

Save the reply to scene.py and render with manim -ql scene.py.

Two integration notes:

  1. Strip the think prefix. The Qwen3 chat template wraps assistant turns in an (empty) <think>\n\n</think> block and the adapter reproduces it. Remove it before compiling: re.sub(r"<think>.*?</think>\s*", "", out, flags=re.S).
  2. Use the bundled system.txt โ€” it is the system prompt the adapter was trained against; the house style is conditioned on it.

House style

Dark slate background (#0e1116), GitHub-dark accent palette declared as constants, Menlo, Pango Text only (no LaTeX required on the render machine), subtitles via add_subcaption (Manim emits an .srt), one descriptive Scene class per file, fade-out ending.

Training

  • Base: mlx-community/Qwen3-4B-Instruct-2507-4bit, LoRA rank 16 on 16 layers (14.7M trainable params, 0.365%), mask_prompt: true, 650 iterations, effective batch 4, cosine decay 5e-5 โ†’ 5e-6. Peak memory 5.5GB.
  • Data: 851 train / 74 valid examples, all render-verified.
    • 60% synthetic: 12 parameterized algorithm/math scene families in the house style (binary search, two pointers, sliding window, bubble sort, stack bracket-matching, BFS grid, linked-list reversal, hash-map two-sum, prefix sums, Kadane, decimalโ†’binary, function plots).
    • 40% wild, render-gate-filtered: bespokelabs/bespoke-manim, the official ManimCE docs examples (back-captioned), and SuienR/ManimBench-v1. Wild examples were trained under a separate plain system prompt so the house prompt stays bound to house-style outputs.
  • Loss: val 1.117 โ†’ ~0.22 (loss.csv in this repo).

Limitations

  • Novel concept types outside the trained families can hallucinate plausible-but- fake API (Text.add_cell) โ€” always render-check generated code (it's cheap).
  • No Tex/MathTex: the house style is deliberately LaTeX-free.
  • Tuned for ~10โ€“25 s explainer scenes at 480p/1080p, not long-form videos.
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for albertobarnabo/scenesmith-qwen3-4b