HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 634k • 1.1k
How to use ray0rf1re/Nano-Nano_v5.1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="ray0rf1re/Nano-Nano_v5.1")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-Nano_v5.1")
model = AutoModelForCausalLM.from_pretrained("ray0rf1re/Nano-Nano_v5.1")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use ray0rf1re/Nano-Nano_v5.1 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ray0rf1re/Nano-Nano_v5.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-Nano_v5.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/ray0rf1re/Nano-Nano_v5.1
How to use ray0rf1re/Nano-Nano_v5.1 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "ray0rf1re/Nano-Nano_v5.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-Nano_v5.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "ray0rf1re/Nano-Nano_v5.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-Nano_v5.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use ray0rf1re/Nano-Nano_v5.1 with Docker Model Runner:
docker model run hf.co/ray0rf1re/Nano-Nano_v5.1
Fully redesigned successor to Nano-nano v4.5.
~298M Qwen3 parameters trained with sequence packing on a quality-tiered 34-dataset mix.
Features loss-boost system: auto-extends training if loss > 4.95 (up to 3×75 steps).
Goal: loss < 2.5 through compute efficiency, not raw scale.
| Architecture | LLaMA decoder-only |
| Parameters | ~1218.3 M |
| Context | 2 048 tokens |
| Vocabulary | 50,304 tokens |
| Training loss | 2.0444 |
| Eval score | 16.7% |
| Tokens trained | 0.01 B (sequence-packed) |
| Hardware | GTX 1080 8 GB (Pascal) |
| Hyperparameter | v4 | v4.5 | v5.1 |
|---|---|---|---|
| Parameters | ~236 M | ~256 M | ~1218.3 M (~1.218 B) |
hidden_size |
896 | 896 | 1 024 |
intermediate_size |
2 688 | 2 912 | 2 730 (8/3×hidden) |
num_hidden_layers |
14 | 15 | 16 |
num_attention_heads |
14 | 14 | 16 |
num_key_value_heads |
14 | 14 | 16 |
head_dim |
64 | 64 | 64 |
vocab_size |
50 264 | 50 264 | 50,304 |
max_position_embeddings |
1 024 | 2 048 | 2 048 |
rms_norm_eps |
1e-6 | 1e-6 | 1e-5 |
rope_theta |
10 000 | 10 000 | 10 000 |
rope_scaling |
— | linear 2× | linear 2× |
tie_word_embeddings |
False | False | False |
| Sequence packing | ❌ | ❌ | ✅ 1× packed |
| Architecture | LLaMA | LLaMA | Qwen3 |
| GQA (KV heads) | 14 full | 16 full | 8 (16Q/8KV) |
| QK-Norm | ❌ | ❌ | ✅ |
| rope_theta | 10k | 10k | 1M |
| Category | Hits | Score |
|---|---|---|
| Knowledge | 0/5 | 🔴 0% |
| Reasoning | 0/4 | 🔴 0% |
| Hallucination | 0/4 | 🔴 0% |
| Instruction | 2/4 | 🟡 50% |
| Coherence | 1/3 | 🔴 33% |
| Overall | — | 🔴 17% |
Hallucination resistance tests whether the model correctly declines or hedges on unanswerable questions (future events, fictional entities, impossible premises).
| Change | v4.5 | v5.1 | Why |
|---|---|---|---|
| Sequence packing | ❌ padding waste | ✅ 100% tokens | ~3× more signal per step |
| Dataset quality | mixed web+instruction | GPT-4 quality-tiered | Faster loss reduction |
| Parameters | ~256 M | ~1218.3 M (~1.218 B) | Better capacity |
| Datasets | 15 | 21 | More diversity |
| LR | 1e-4 | 2e-4 | 1e-4 was too conservative |
| Setting | Value |
|---|---|
| Hardware | GTX 1080 8 GB · Pascal · CUDA 6.1 |
| Precision | fp32 weights / fp16 AMP (GradScaler) |
| Optimizer | StovetopCooker (HyperNix, pre-Volta) + cosine |
| LR | 0.0002 cosine |
| Warmup | 8% |
| Embedding freeze | First 20% of steps |
| Effective batch | 8 × 512 = 4,096 tokens/step |
| Loss boost | ≤3 rounds of 75 steps if loss > 4.95 |
| Sequence packing | ✅ streaming, 1× epochs, 150,000 chunks cap |
| Grad clipping | 5.0 |
| Grad checkpointing | ✅ |
| Peak VRAM | 5.44 GB |
| Final loss | 2.0444 |
| Tier | Dataset | Samples | Weight | Category |
|---|---|---|---|---|
| 1 | Open-Orca/OpenOrca |
40 k | 3.0× | GPT-4 reasoning |
| 1 | meta-math/MetaMathQA |
30 k | 2.8× | Math augmentation |
| 1 | Roman1111111/claude-opus-4.6-10000x |
10 k | 2.5× | Claude conversations |
| 1 | WizardLM/WizardLM_evol_instruct_V2_196k |
25 k | 2.5× | Evolved instruction |
| 1 | WithinUsAI/GPT5.5_thinking_max_distill_god_seed_25K |
25 k | 2.5× | Reasoning traces |
| 2 | microsoft/orca-math-word-problems-200k |
20 k | 2.2× | Math word problems |
| 2 | lighteval/MATH-Hard |
10 k | 2.2× | Hard math |
| 2 | HuggingFaceH4/MATH-500 |
500 | 2.2× | Competition math |
| 2 | garage-bAInd/Open-Platypus |
25 k | 2.0× | Reasoning instruction |
| 2 | teknium/OpenHermes-2.5 |
30 k | 2.0× | GPT-4 instruction |
| 3 | ise-uiuc/Magicoder-OSS-Instruct-75K |
20 k | 1.8× | Code instruction |
| 3 | m-a-p/CodeFeedback-Filtered-Instruction |
15 k | 1.8× | Code + feedback |
| 3 | iamtarun/python_code_instructions_18k_alpaca |
8 k | 1.6× | Python code |
| 3 | nvidia/OpenCodeInstruct |
20 k | 1.5× | Code instruction |
| 3 | b-mc2/sql-create-context |
6 k | 1.4× | SQL generation |
| 4 | HuggingFaceH4/ultrachat_200k |
30 k | 1.5× | Multi-turn chat |
| 4 | databricks/databricks-dolly-15k |
15 k | 1.2× | Instruction following |
| 4 | Amod/mental_health_counseling_conversations |
5 k | 1.0× | Counseling chat |
| 4 | mlabonne/guanaco-llama2-1k |
1 k | 1.0× | General QA |
| 5 | ray0rf1re/FineWeb-Nano |
20 k | 0.8× | Web text |
| 5 | ray0rf1re/hyper-pip |
85 | 3.0× | HyperNix pip data |
| 3 | flytech/python-codes-25k |
20 k | 1.7× | Python code solutions |
| 3 | ByteDance-Seed/Code-Contests-Plus |
15 k | 1.6× | Competitive coding |
| 1 | open-thoughts/OpenThoughts-TB-dev |
20 k | 2.3× | Verified thinking traces |
| 6 | Nix-ai/cat-math-v1 |
5 k | 0.3× | Cat math (niche) |
| 6 | Nix-ai/Cat-v2.8XXXL-plus |
5 k | 0.3× | Cat general (niche) |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ray0rf1re/Nano-Nano_v5.1", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-Nano_v5.1")
def chat(prompt: str, max_new_tokens: int = 256) -> str:
# <think> opens the reasoning block; model outputs reasoning then </think> then answer
text = ("<|im_start|>user
" + prompt + "<|im_end|>
"
"<|im_start|>assistant
<think>
")
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs, max_new_tokens=max_new_tokens,
do_sample=True, temperature=0.7, top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True).strip()
print(chat("Write a Python function to merge two sorted lists."))
print(chat("Solve: if 3x + 7 = 22, what is x?"))
print(chat("Explain transformer attention in simple terms."))
@misc{nano-nano-v5,
author = {ray0rf1re},
title = {Nano-Nano v5.1: 300M LLaMA with Sequence Packing},
year = {2026},
publisher = {HuggingFace},
howpublished = {https://huggingface.co/ray0rf1re/Nano-Nano_v5.1},
}