training-lab
Experiments in voice dictation to programming syntax. Teaching small models to understand spoken code.
Domain
Converting spoken dictation like "git space push space dash u space origin space main" into actual syntax: git push -u origin main.
The challenge: users don't always speak in perfect protocol format. They use synonyms ("minus" for "dash"), skip separator words, add conversational filler ("okay so the command is..."), and make mid-sentence corrections ("no wait, actually...").
Architecture
Raw speech transcript
β Protocol detector (is it already clean?)
β IF clean: bypass LLM β procedural processor
β IF messy: LLM normalizer β procedural processor
β Final syntax output
Procedural processor β deterministic token scanner. Symbol vocabulary, number words, casing directives. 93% on clean input, zero hallucination, instant.
LLM normalizer β rewrites messy dictation into clean protocol format. Strips filler, resolves corrections, inserts spacing keywords. The LLM never outputs actual symbols β it only outputs protocol words.
Structure
processor/ Deterministic symbol/number/casing processor
pipeline/ LLM + processor pipeline (zero-training normalizer)
eval/ Evaluation datasets (fuzzy + independent)
training/
data/ Training data (syntax-reconstruction, dictation-to-bash)
converters/ Scripts to generate training data from NL2Bash
adapters/ Fine-tuned model adapters (LoRA/DoRA)
scripts/ Evaluation and benchmarking scripts
blog/ Writeup drafts and notes
Quick start
# Run the procedural processor on clean protocol input
python3 processor/procedural.py eval/independent.json
# Run the normalizer pipeline (requires mlx-lm)
pip install mlx mlx-lm
python3 pipeline/normalizer.py eval/fuzzy.json --model mlx-community/Qwen2.5-1.5B-Instruct-4bit
Results (zero-training, prompted only)
| Model | Clean | Fuzzy | Natural | Chaotic | Overall |
|---|---|---|---|---|---|
| Processor only | 92% | 0% | 0% | 2% | 23.5% |
| Qwen 2.5 1.5B | 90% | 20% | 54% | 24% | 47% |
| Qwen 2.5 0.5B | 90% | 12% | 44% | 20% | 41.5% |
| Llama 3.2 1B | 92% | 14% | 34% | 10% | 37.5% |
Protocol format
The "space-as-a-word" protocol eliminates spacing ambiguity:
"space"β literal space between tokens- Symbol words:
dash dot slash pipe colon quoteetc. - Casing:
camel case,snake case,pascal case,kebab case - Numbers:
zerothroughnineteen,twenty...ninety,hundred,thousand - Capitalization:
capital X,all caps WORD