Lolaby — Llama 3.2 3B, fine-tuned to write singable lullabies

A small (3B) instruction-tuned model that writes original, personalised lullabies with chord markers and a tempo/meter header, designed for use with a downstream singing/synthesis pipeline.

This is the lyric-generation model used by Lolaby, a small, AI-powered app that turns a child's drawing into a personalised lullaby. Built for the Build Small Hackathon 2026 (Backyard AI track).

Write a lullaby for: Mia, age 3 Loves: a stuffed fox named after a color Fears: the dark when the light goes out

[C] Mia, your fox keeps [Am] watch tonight, [F] curled up close where it's [G] warm and bright, [C] the dark outside is [Am] soft and small, [F] and your fox can hear me [G] sing through it all...


Training data — built from scratch with anti-boilerplate gates

1,500 lullabies, distilled from Claude Haiku 4.5 with several hard gates that reject boilerplate at generation time:

Gate What it does
Concrete love steer Each prompt samples a specific sub-topic from a pool (e.g. "a hot-air balloon drifting up", "the corner bakery at dawn") instead of vague categories — the teacher can't default to its favourite cliché.
N-gram dedup Every generated lyric line is normalised and checked against all previously-accepted lines via 4-gram Jaccard overlap. Examples with > 0.4 overlap on too many lines are rejected.
Theme cap No single love-category (animal, nature, comfort object, etc.) may exceed 16% of the dataset. Forces broad coverage.
Opener dedup The first 4 words of each lullaby (with the name normalised) are tracked. Over-used opening shapes are rejected after 3 uses — kills " watches..." / "...eyes are growing soft".
Format gate Every example must parse — tempo header, valid chord markers from the declared progression, sensible line count — or it's regenerated.
Safety re-check Even though the teacher is told to stay wholesome, the synthesised LOVE, FEAR, and lyric body are each re-screened with a safety filter before acceptance.

Result: 99.4% line uniqueness (11,925 unique lines out of 12,001 total lyric lines across 1,500 examples).

Diversity dimensions sampled per example

  • 49 names across multiple naming traditions (Mia, Aiko, Mateo, Tariq, Saoirse, ...).
  • 12 ages (1-7, weighted toward 2-5).
  • 10 love categories × ~10-17 specific sub-topics each (≈ 120 concrete loves).
  • 15 fear topics (and ~25% of examples have no fear, the natural "just comfort" case).
  • 12 moods (cosy and content / restless but settling / clingy and needing reassurance / ...).
  • 12 keys, 3 meters (6/8, 3/4, 4/4), 2-4 diatonic progressions per key.

The training-prompt format mirrors what the downstream app sends at inference, so there's no distribution shift between training and serving:

Write a lullaby for: <name>, age <n>
Loves: <concrete love>
Fears: <concrete fear>      # omitted if no fear
Mood: <mood>
Key: <key>
Meter: <meter>

Training setup

  • Base model: unsloth/Llama-3.2-3B-Instruct (4-bit loaded via Unsloth)
  • Method: LoRA via PEFT
  • LoRA config: r=16, alpha=32, dropout 0, applied to all 7 attention + MLP projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • Sequence length: 1024
  • Optimiser: AdamW 8-bit, learning rate 2e-4, cosine schedule, 20 warmup steps, weight decay 0.01
  • Batch: per-device 4, gradient accumulation 4 (effective batch 16)
  • Epochs: 3, with load_best_model_at_end on eval loss
  • Precision: bf16 where supported, fp16 otherwise
  • Eval/save strategy: per-epoch; the best checkpoint by eval_loss was kept (epoch 2 in our run)
  • Seed: 42

Trained on a single Colab T4 in ~45 minutes via the included train_lullaby.ipynb notebook.

Files in this repo

The repo ships in three forms so it's usable across runtimes:

  • LoRA adapter (adapter_model.safetensors) — smallest, requires the base model at inference.
  • Merged FP16 weights — standalone HF transformers checkpoint.
  • GGUF Q4_K_M — for llama.cpp and downstream tooling. The Lolaby app uses these on CPU.

How to use

With llama-cpp-python (CPU, what Lolaby itself uses)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="build-small-hackathon/lolaby-llama-3b",
    filename="*Q4_K_M.gguf",
    n_ctx=2048,
    verbose=False,
)

prompt = (
    "Write a lullaby for: Mia, age 3\n"
    "Loves: a stuffed fox named after a color\n"
    "Fears: the dark when the light goes out\n"
    "Mood: sleepy and comforted\n"
    "Key: C major\n"
    "Meter: 6/8"
)

out = llm.create_chat_completion(
    messages=[
        {"role": "system",
         "content": "You write personalized lullabies for small children, "
                    "with chord markers and a tempo/meter header so a guitar "
                    "accompaniment can be rendered. Output only the lullaby — "
                    "no preamble."},
        {"role": "user", "content": prompt},
    ],
    max_tokens=512,
    temperature=0.85,
)
print(out["choices"][0]["message"]["content"])

With transformers (GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "build-small-hackathon/lolaby-llama-3b"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [
    {"role": "system",
     "content": "You write personalized lullabies for small children, "
                "with chord markers and a tempo/meter header so a guitar "
                "accompaniment can be rendered. Output only the lullaby — "
                "no preamble."},
    {"role": "user", "content":
        "Write a lullaby for: Mia, age 3\n"
        "Loves: a stuffed fox named after a color\n"
        "Mood: sleepy and comforted\n"
        "Key: C major\n"
        "Meter: 6/8"},
]
inputs = tok.apply_chat_template(messages, return_tensors="pt",
                                  add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True,
                     temperature=0.85, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Output format

Every generation begins with a two-line header, then a blank line, then the lullaby itself. Lyric lines are prefixed with chord markers in [brackets] inline, drawn from the requested progression. Two 4-line verses by default.

Tempo: 54 bpm, 6/8
Progression: C - Am - F - G

[C] First line of [Am] first verse,
[F] continues here [G] in rhyme,
[C] third line of [Am] the verse,
[F] gentle close in [G] time...

[C] Second verse [Am] line one,
[F] line two [G] continues,
[C] third line of [Am] verse,
[F] final line [G] ending...

This format is consumed downstream by the Lolaby app's synth and TTS pipeline — the chord markers drive instrument rendering, the tempo header drives playback timing.

Safety

Every training example was safety-screened during generation: the synthesised LOVE, FEAR, and lyric body were each passed through a content filter before acceptance, with anything that slipped through the teacher's own guidance rejected and regenerated. This is a deliberate choice for a model intended for children — the training distribution itself is wholesome by construction, not retrofitted at inference time.

The teacher model was also explicitly instructed to keep all invented content age-appropriate (no death, violence, weapons, horror, substances, or anything frightening beyond a gentle, easily-soothed childhood worry).

Evaluation

Trained on 1,425 examples / held out 75 (5%) for eval. The best checkpoint by held-out loss was kept (epoch 2 of 3).

Beyond eval loss, the model was assessed on a deliberately-varied 10-spec sanity battery stressing diversity (different cultures of name, different love categories, edge-case loves the previous model failed on, fears that need actual soothing). Each generation was scored for:

  • Coherence — lines parse as English, no garbled tokens.
  • Faithfulness — the love and fear from the prompt appear in the lyric in recognisable form.
  • Variety — no two generations from the battery share opening or refrain shapes.
  • Format compliance — tempo header present, chord markers from the declared progression only.

The model passes all four on the held-out battery. The previous boilerplate-trained model passed only format compliance.

Handling edge cases

When given a love or fear the model has never seen at training time, it generalises gracefully — finding the nearest comforting concept and weaving it in. A specific fictional character or unusual object will typically be rendered through a familiar lens (e.g. "your little friend", "a quiet companion") so the lullaby stays singable and warm rather than breaking or refusing. This is the right behaviour for a bedtime song: soft landing, not literal lookup.

Limitations

  • English only. Training data is English. The model will attempt other languages but quality drops fast and chord markers may be inserted at awkward positions.
  • Dataset provenance. Training data was generated by Claude Haiku 4.5 under Anthropic's API. The dataset is not distributed with this repo. Anthropic's usage policy restricts redistribution of Claude-generated outputs, so the dataset is kept private to stay compliant.
  • Not a general LLM. This model is narrow-purpose. Asked off-topic questions, it will sometimes attempt to answer in lullaby form.

Intended use and out-of-scope use

Intended:

  • Generating personalised bedtime lullabies for use in apps, devices, or printed/digital lyric sheets.
  • A teaching example of dataset-quality-first fine-tuning.

Not intended:

  • General-purpose conversation or instruction following (the base Llama-3.2-3B-Instruct is better suited for that).
  • Content involving minors that goes beyond gentle bedtime themes.
  • Anything safety-critical.

Citation

If you use this model or the dataset-building approach, a link back to the Lolaby Space or this model is enough. If you want a Bib entry:

@software{lolaby_2026,
  author  = {André Oliveira and Vasco Oliveira},
  title   = {Lolaby: a small-AI lullaby generator},
  year    = {2026},
  url     = {https://huggingface.co/build-small-hackathon/lolaby-llama-3b},
  note    = {Built for the Hugging Face Build Small Hackathon 2026.}
}

License

This model inherits the Llama 3.2 Community License from its base. You agree to that license by using these weights — see https://www.llama.com/llama3_2/license/.

Acknowledgements

  • Meta for releasing Llama 3.2 3B Instruct as the base.
  • Unsloth for the 4-bit + LoRA training stack that made this trainable on a free Colab T4.
  • Anthropic for the Claude Haiku 4.5 API used as the teacher model during dataset distillation.
  • Hugging Face & Gradio for hosting the Build Small Hackathon 2026 and shaping a thoughtful "small AI" prompt.
Downloads last month
342
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/lolaby-llama-3b

Adapter
(763)
this model
Adapters
1 model

Space using build-small-hackathon/lolaby-llama-3b 1

Collection including build-small-hackathon/lolaby-llama-3b