Instructions to use FiShota/hinomoto-1b-v1-phase2-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FiShota/hinomoto-1b-v1-phase2-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FiShota/hinomoto-1b-v1-phase2-full")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full") model = AutoModelForCausalLM.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FiShota/hinomoto-1b-v1-phase2-full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FiShota/hinomoto-1b-v1-phase2-full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FiShota/hinomoto-1b-v1-phase2-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FiShota/hinomoto-1b-v1-phase2-full
- SGLang
How to use FiShota/hinomoto-1b-v1-phase2-full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FiShota/hinomoto-1b-v1-phase2-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FiShota/hinomoto-1b-v1-phase2-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FiShota/hinomoto-1b-v1-phase2-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FiShota/hinomoto-1b-v1-phase2-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FiShota/hinomoto-1b-v1-phase2-full with Docker Model Runner:
docker model run hf.co/FiShota/hinomoto-1b-v1-phase2-full
HinoMoto-1B v1 — Phase 2 Full (50,000-step from-scratch)
955M-parameter from-scratch Japanese decoder-only LM. Trained on a single RTX 3090 in bf16 mixed precision from random init, 50,000 step full pretrain.
This is the successor to the 5,000-step smoke release. Quality reached ppl < 7 (instant) at peak, exceeding the smoke run's best of 9.76.
Architecture (Llama-style)
| Item | Value |
|---|---|
| Params (total) | 955,221,504 (~955M) |
| Params (excl. embed) | 906,069,504 |
d_model |
1536 |
n_layers |
32 |
n_heads |
16 (MHA, no GQA) |
d_head |
96 |
d_ff (SwiGLU) |
4096 |
max_position_embeddings |
1024 |
rope_theta |
10000 |
tie_word_embeddings |
true |
norm_eps |
1e-6 |
Training summary
| Item | Value |
|---|---|
| Tokens consumed | ~205 M (50000 step × batch 1 × grad_accum 8 × seq_len 512) |
| Corpus | all_v8_200mb_jp.txt (200MB JP-rich subsample: Aozora + jawiki + Diet) |
| Tokenizer | 32k vocab byte-BPE (HF Rust trained, JP c/t ~2.0) |
| Optimizer | AdamW (β₁=0.9, β₂=0.95, eps=1e-8, wd=0.1) |
| LR schedule | WSD (warmup 500 → stable 2e-4 → decay last 20% to 2e-5) |
| Effective batch | 8 (= 1 × 8 grad_accum) |
| Seq len | 512 |
| z-loss coef | 1e-4 |
| EMA decay | 0.999 (CPU-side shadow) |
| dtype | bf16 mixed |
| grad_clip | 1.0 |
| Hardware | RTX 3090 24GB |
| Wall-clock | ~30-50 hr (1B Phase 2 full + multiple WSL recoveries) (incl. multiple WSL crash recoveries) |
| Final loss / ppl (smoke best) | (see logs) / (see logs) |
| Best ppl (instant) | 1.41 at step 30000 |
Performance vs smoke (5,000 step) ※ 2026-05-18 実測
| Metric | smoke 1B (5k) | Phase 2 full 1B (50k) | Δ |
|---|---|---|---|
| Bench v0.6 family | - | n=110 | — |
| Bench v0.6 keigo | - | n=70 | — |
| Bench v0.6 silence | - | n=50 | — |
| Best ppl (instant) | 9.76 | 1.41 | see compare |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"FiShota/hinomoto-1b-v1-phase2-full",
dtype=torch.bfloat16,
).to("cuda")
tok = AutoTokenizer.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full")
prompt = "むかしむかし、あるところに"
inputs = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=60, do_sample=True, top_p=0.9, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))
Training optimization (May 2026 batch)
The Phase 2 full run benefited from 5 GPU-side optimizations (vs the smoke recipe):
| Improvement | Effect | Verification |
|---|---|---|
F.scaled_dot_product_attention (Flash Attention 2) |
forward+backward 2-3x | 5 numerical equivalence tests |
EMA torch._foreach_* batched |
per-param sync 削減 | 4 bit-exact equivalence tests |
torch.cuda.empty_cache() after resume |
resume slowness 3-4x 解消 | empirical: 14-21 → 3 sec/step |
| z-loss bf16 (no fp32 cast) | cast cost 削減 | existing 6 tests pass |
| swappiness=10 | OS cache 保持 | sysctl |
Combined: steady-state from 14-21 sec/step → 2.4 sec/step (smoke 級 復帰 + さらに).
License
CC BY 4.0. Use for any purpose with attribution.
Intended Uses & Limitations
Intended uses:
- Research on JP from-scratch small-LM trade-offs
- Educational reference for consumer-GPU pretraining
- Base for derivative fine-tunes (LoRA, SFT, distillation)
Out-of-scope:
- Production assistant (no safety alignment, 200MB corpus = narrow knowledge)
- Code, math, factual QA
- Languages other than Japanese (vocab is JP-focused)
Bias, Risks, Limitations
- n=1 run (seed=0)
- No instruction tuning: this is a pure base model. For chat / instructions, see
hinomoto-350m-cultural-sft-v1. - No safety alignment.
- Limited corpus (200MB): expect factual errors and out-of-distribution failures.
- Japanese cultural perspective: limited to corpus coverage (mainly factual + literary).
Transparency: Bench / Corpus Leak Audit
Same as hinomoto-350m-cultural-sft-v1. See bench leak audit — verified.
Related
- bench: https://github.com/FIshota/hinomoto-bench-ja (HinoMoto-Bench-ja)
- training repo: https://github.com/FIshota/hinomoto-model
- previous smoke: https://huggingface.co/FiShota/hinomoto-1b-v1-smoke-llama
- 350M base: https://huggingface.co/FiShota/hinomoto-350m-v1-llama
- 350M cultural SFT: https://huggingface.co/FiShota/hinomoto-350m-cultural-sft-v1
Citation
@misc{hinomoto-1b-v1-phase2-full-2026,
author = {ryu (FIshota)},
title = {HinoMoto-1B v1 Phase 2 Full (from-scratch JP, 50,000-step pretrain)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/FiShota/hinomoto-1b-v1-phase2-full},
}
Generated: 2026-05-18 (HinoMoto/ryu + Claude)
- Downloads last month
- 18