nano-dates

Converts a natural date phrase ("next friday", "the 3rd of July 2025") to an ISO-8601 date. It is small enough to run on a CPU in milliseconds and was trained entirely on code-generated data β€” no scraping, no labelling, no distillation from a larger model.

2024-03-10 | the 3rd of July 2025 => 2025-07-03
2024-03-10 | Jun 12 2023          => 2023-06-12
2024-03-10 | next week            => 2024-03-17
2024-03-10 | in 3 months          => 2024-06-10

The model is given a reference date (today) at the start of the prompt, so relative phrases ("tomorrow", "in 3 weeks") are computable from the input alone β€” it never needs a wall clock.

Code, training, and reproduction: https://github.com/vukrosic/nano-dates (self-contained train.py / eval.py / data generator / tests). Technical report (PDF): nano-dates-report.pdf β€” the recipe, the data-leak bug, and where a 1M model's reasoning breaks.

accuracy by category

Why this exists

A 1M-parameter model can't be a general assistant, but it can completely nail a task that is narrow and formally specified. Date→ISO is exactly that: the answer has a known structure, so you can sample the answer first and render it in many natural forms, producing perfectly-labelled training data for free and in unlimited quantity. That is strictly better than asking a big model to generate data — the label is the ground truth, not a guess.

This model is a worked demonstration of that recipe (and an honest map of where a nano model's capability ends β€” see below).

What it can and can't do

Held-out exact-match accuracy (2,000 unseen examples, greedy decode):

capability category accuracy
Parse absolute dates 2023-06-12, June 12, 2023, Jun 12 2023, 12 June 2023, the 12th of June 2023 100%
Resolve simple relatives today, tomorrow, yesterday, next/last week, next month, in N months 98–100%
Variable-N day/week arithmetic in N days, N days ago, in N weeks 77–81%
Weekday resolution next/last <weekday> ~12% ❌
Overall mixed 85.4%

The clean limitation: weekday resolution ("next friday") is unsolved at this size. It requires mapping an arbitrary date to its weekday and then doing modular arithmetic β€” the hardest computation in the set β€” and a 1M model doesn't get there. Everything else, including the absolute-form parsing and most relative arithmetic, it does reliably. The accuracy numbers reflect genuine parsing: absolute phrases are trained with a reference date independent of the answer, so the model cannot cheat by copying the prompt.

Usage

The repo includes a self-contained model definition (modeling_nano_dates.py) β€” no training framework required, just torch and safetensors.

Download the three files you need (modeling_nano_dates.py, model.safetensors, config.json) and run:

pip install torch safetensors huggingface_hub
python - <<'PY'
from huggingface_hub import hf_hub_download
for f in ["modeling_nano_dates.py", "model.safetensors", "config.json"]:
    hf_hub_download("vukrosic/nano-dates", f, local_dir=".")
PY
python -c "from modeling_nano_dates import load, parse; m=load(); print(parse(m,'2024-03-10','next month'))"
# -> 2024-04-10
from modeling_nano_dates import load, parse

model = load("model.safetensors", "config.json")
print(parse(model, "2024-03-10", "the 3rd of July 2025"))  # -> 2025-07-03
print(parse(model, "2024-03-10", "next month"))            # -> 2024-04-10

Or just run the file for a demo: python modeling_nano_dates.py.

Prompt format the model was trained on (byte-for-byte):

<reference ISO date> | <phrase> => <answer ISO date>

parse() builds that prompt and greedily decodes exactly 10 characters.

Set it up with an AI agent

Paste this into Claude Code, Cursor, or any coding agent and it will fetch and run the model for you:

Set up the nano-dates model from Hugging Face (vukrosic/nano-dates) and run inference.

1. pip install torch safetensors huggingface_hub
2. Download three files with huggingface_hub.hf_hub_download("vukrosic/nano-dates", f)
   for f in ["modeling_nano_dates.py", "model.safetensors", "config.json"].
3. The model is a single self-contained file exposing load() and
   parse(model, today_iso, phrase) -> ISO-8601 string.
4. Run:
     from modeling_nano_dates import load, parse
     m = load()
     for p in ["the 3rd of July 2025", "next month", "Jun 12 2023", "yesterday"]:
         print(p, "->", parse(m, "2024-03-10", p))
5. Report outputs. Known limits: absolute dates + simple relatives ~100%,
   variable-N day/week math ~77-81%, weekday phrases ("next friday") ~12% β€” a
   1M-param capacity ceiling, not a bug. This is a capability demo, NOT a production
   date parser; for production use dateutil/chrono.

Model details

Parameters 1,016,960
Architecture decoder-only transformer (pre-norm)
Tokenizer raw UTF-8 bytes (vocab 256, no vocab file)
dim / layers / heads 128 / 4 / 4 (2 KV heads, GQA)
Norm / position / FFN RMSNorm / RoPE / SwiGLU
Context 64 bytes
Training SFT, prompt-masked cross-entropy, 12k steps, AdamW, cosine LR 3e-3
Data 100k code-generated pairs, 17 surface renderers
Final val loss 0.036

Limitations & honest scope

  • Not a production date library. For real software, dateutil/chrono are exact and free. This model's value is as a method demonstration and a study of what a nano model can learn from synthetic data, not as a dependency.
  • Weekday phrases ("next friday") are unreliable (~12%). Don't use them.
  • English only, and only the 17 surface forms it was trained on. It has not seen "12/06/2023"-style numeric forms (deliberately β€” they're ambiguous).
  • Reference dates were drawn from 2015–2035; far outside that, behaviour is untested.

What should this method point at next?

The interesting question isn't this model β€” it's the recipe. If you have a narrow, formal, annoying task you wish a tiny reliable model could do (parse, normalize, validate, convert), that's exactly the shape this approach fits. Tell me what it is β€” open a discussion on this repo.


Built from scratch with voidlab. Trained on a single GPU in ~30 seconds.

Downloads last month
41
Safetensors
Model size
1.02M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support