HinoMoto-1B v1 — Phase 2 Full (50,000-step from-scratch)

955M-parameter from-scratch Japanese decoder-only LM. Trained on a single RTX 3090 in bf16 mixed precision from random init, 50,000 step full pretrain.

This is the successor to the 5,000-step smoke release. Quality reached ppl < 7 (instant) at peak, exceeding the smoke run's best of 9.76.

Architecture (Llama-style)

Item Value
Params (total) 955,221,504 (~955M)
Params (excl. embed) 906,069,504
d_model 1536
n_layers 32
n_heads 16 (MHA, no GQA)
d_head 96
d_ff (SwiGLU) 4096
max_position_embeddings 1024
rope_theta 10000
tie_word_embeddings true
norm_eps 1e-6

Training summary

Item Value
Tokens consumed ~205 M (50000 step × batch 1 × grad_accum 8 × seq_len 512)
Corpus all_v8_200mb_jp.txt (200MB JP-rich subsample: Aozora + jawiki + Diet)
Tokenizer 32k vocab byte-BPE (HF Rust trained, JP c/t ~2.0)
Optimizer AdamW (β₁=0.9, β₂=0.95, eps=1e-8, wd=0.1)
LR schedule WSD (warmup 500 → stable 2e-4 → decay last 20% to 2e-5)
Effective batch 8 (= 1 × 8 grad_accum)
Seq len 512
z-loss coef 1e-4
EMA decay 0.999 (CPU-side shadow)
dtype bf16 mixed
grad_clip 1.0
Hardware RTX 3090 24GB
Wall-clock ~30-50 hr (1B Phase 2 full + multiple WSL recoveries) (incl. multiple WSL crash recoveries)
Final loss / ppl (smoke best) (see logs) / (see logs)
Best ppl (instant) 1.41 at step 30000

Performance vs smoke (5,000 step) ※ 2026-05-18 実測

Metric smoke 1B (5k) Phase 2 full 1B (50k) Δ
Bench v0.6 family - n=110
Bench v0.6 keigo - n=70
Bench v0.6 silence - n=50
Best ppl (instant) 9.76 1.41 see compare

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "FiShota/hinomoto-1b-v1-phase2-full",
    dtype=torch.bfloat16,
).to("cuda")
tok = AutoTokenizer.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full")

prompt = "むかしむかし、あるところに"
inputs = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=60, do_sample=True, top_p=0.9, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))

Training optimization (May 2026 batch)

The Phase 2 full run benefited from 5 GPU-side optimizations (vs the smoke recipe):

Improvement Effect Verification
F.scaled_dot_product_attention (Flash Attention 2) forward+backward 2-3x 5 numerical equivalence tests
EMA torch._foreach_* batched per-param sync 削減 4 bit-exact equivalence tests
torch.cuda.empty_cache() after resume resume slowness 3-4x 解消 empirical: 14-21 → 3 sec/step
z-loss bf16 (no fp32 cast) cast cost 削減 existing 6 tests pass
swappiness=10 OS cache 保持 sysctl

Combined: steady-state from 14-21 sec/step → 2.4 sec/step (smoke 級 復帰 + さらに).

License

CC BY 4.0. Use for any purpose with attribution.

Intended Uses & Limitations

Intended uses:

  • Research on JP from-scratch small-LM trade-offs
  • Educational reference for consumer-GPU pretraining
  • Base for derivative fine-tunes (LoRA, SFT, distillation)

Out-of-scope:

  • Production assistant (no safety alignment, 200MB corpus = narrow knowledge)
  • Code, math, factual QA
  • Languages other than Japanese (vocab is JP-focused)

Bias, Risks, Limitations

  • n=1 run (seed=0)
  • No instruction tuning: this is a pure base model. For chat / instructions, see hinomoto-350m-cultural-sft-v1.
  • No safety alignment.
  • Limited corpus (200MB): expect factual errors and out-of-distribution failures.
  • Japanese cultural perspective: limited to corpus coverage (mainly factual + literary).

Transparency: Bench / Corpus Leak Audit

Same as hinomoto-350m-cultural-sft-v1. See bench leak audit — verified.

Related

Citation

@misc{hinomoto-1b-v1-phase2-full-2026,
  author = {ryu (FIshota)},
  title = {HinoMoto-1B v1 Phase 2 Full (from-scratch JP, 50,000-step pretrain)},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/FiShota/hinomoto-1b-v1-phase2-full},
}

Generated: 2026-05-18 (HinoMoto/ryu + Claude)

Downloads last month
18
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FiShota/hinomoto-1b-v1-phase2-full

Finetunes
2 models