Instructions to use FiShota/hinomoto-1b-v1-phase2-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FiShota/hinomoto-1b-v1-phase2-full with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FiShota/hinomoto-1b-v1-phase2-full")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full")
model = AutoModelForCausalLM.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FiShota/hinomoto-1b-v1-phase2-full with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FiShota/hinomoto-1b-v1-phase2-full"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FiShota/hinomoto-1b-v1-phase2-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/FiShota/hinomoto-1b-v1-phase2-full

SGLang

How to use FiShota/hinomoto-1b-v1-phase2-full with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FiShota/hinomoto-1b-v1-phase2-full" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FiShota/hinomoto-1b-v1-phase2-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FiShota/hinomoto-1b-v1-phase2-full" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FiShota/hinomoto-1b-v1-phase2-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use FiShota/hinomoto-1b-v1-phase2-full with Docker Model Runner:
```
docker model run hf.co/FiShota/hinomoto-1b-v1-phase2-full
```

HinoMoto-1B v1 — Phase 2 Full (50,000-step from-scratch)

955M-parameter from-scratch Japanese decoder-only LM. Trained on a single RTX 3090 in bf16 mixed precision from random init, 50,000 step full pretrain.

This is the successor to the 5,000-step smoke release. Quality reached ppl < 7 (instant) at peak, exceeding the smoke run's best of 9.76.

Architecture (Llama-style)

Item	Value
Params (total)	955,221,504 (~955M)
Params (excl. embed)	906,069,504
`d_model`	1536
`n_layers`	32
`n_heads`	16 (MHA, no GQA)
`d_head`	96
`d_ff` (SwiGLU)	4096
`max_position_embeddings`	1024
`rope_theta`	10000
`tie_word_embeddings`	true
`norm_eps`	1e-6

Training summary

Item	Value
Tokens consumed	~205 M (50000 step × batch 1 × grad_accum 8 × seq_len 512)
Corpus	`all_v8_200mb_jp.txt` (200MB JP-rich subsample: Aozora + jawiki + Diet)
Tokenizer	32k vocab byte-BPE (HF Rust trained, JP c/t ~2.0)
Optimizer	AdamW (β₁=0.9, β₂=0.95, eps=1e-8, wd=0.1)
LR schedule	WSD (warmup 500 → stable 2e-4 → decay last 20% to 2e-5)
Effective batch	8 (= 1 × 8 grad_accum)
Seq len	512
z-loss coef	1e-4
EMA decay	0.999 (CPU-side shadow)
dtype	bf16 mixed
grad_clip	1.0
Hardware	RTX 3090 24GB
Wall-clock	~30-50 hr (1B Phase 2 full + multiple WSL recoveries) (incl. multiple WSL crash recoveries)
Final loss / ppl (smoke best)	(see logs) / (see logs)
Best ppl (instant)	1.41 at step 30000

Performance vs smoke (5,000 step) ※ 2026-05-18 実測

Metric	smoke 1B (5k)	Phase 2 full 1B (50k)	Δ
Bench v0.6 family	-	n=110	—
Bench v0.6 keigo	-	n=70	—
Bench v0.6 silence	-	n=50	—
Best ppl (instant)	9.76	1.41	see compare

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "FiShota/hinomoto-1b-v1-phase2-full",
    dtype=torch.bfloat16,
).to("cuda")
tok = AutoTokenizer.from_pretrained("FiShota/hinomoto-1b-v1-phase2-full")

prompt = "むかしむかし、あるところに"
inputs = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=60, do_sample=True, top_p=0.9, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))

Training optimization (May 2026 batch)

The Phase 2 full run benefited from 5 GPU-side optimizations (vs the smoke recipe):

Improvement	Effect	Verification
`F.scaled_dot_product_attention` (Flash Attention 2)	forward+backward 2-3x	5 numerical equivalence tests
EMA `torch._foreach_*` batched	per-param sync 削減	4 bit-exact equivalence tests
`torch.cuda.empty_cache()` after resume	resume slowness 3-4x 解消	empirical: 14-21 → 3 sec/step
z-loss bf16 (no fp32 cast)	cast cost 削減	existing 6 tests pass
swappiness=10	OS cache 保持	sysctl

Combined: steady-state from 14-21 sec/step → 2.4 sec/step (smoke 級復帰 + さらに).

License

CC BY 4.0. Use for any purpose with attribution.

Intended Uses & Limitations

Intended uses:

Research on JP from-scratch small-LM trade-offs
Educational reference for consumer-GPU pretraining
Base for derivative fine-tunes (LoRA, SFT, distillation)

Out-of-scope:

Production assistant (no safety alignment, 200MB corpus = narrow knowledge)
Code, math, factual QA
Languages other than Japanese (vocab is JP-focused)

Bias, Risks, Limitations

n=1 run (seed=0)
No instruction tuning: this is a pure base model. For chat / instructions, see hinomoto-350m-cultural-sft-v1.
No safety alignment.
Limited corpus (200MB): expect factual errors and out-of-distribution failures.
Japanese cultural perspective: limited to corpus coverage (mainly factual + literary).

Transparency: Bench / Corpus Leak Audit

Same as hinomoto-350m-cultural-sft-v1. See bench leak audit — verified.

bench: https://github.com/FIshota/hinomoto-bench-ja (HinoMoto-Bench-ja)
training repo: https://github.com/FIshota/hinomoto-model
previous smoke: https://huggingface.co/FiShota/hinomoto-1b-v1-smoke-llama
350M base: https://huggingface.co/FiShota/hinomoto-350m-v1-llama
350M cultural SFT: https://huggingface.co/FiShota/hinomoto-350m-cultural-sft-v1

Citation

@misc{hinomoto-1b-v1-phase2-full-2026,
  author = {ryu (FIshota)},
  title = {HinoMoto-1B v1 Phase 2 Full (from-scratch JP, 50,000-step pretrain)},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/FiShota/hinomoto-1b-v1-phase2-full},
}

Generated: 2026-05-18 (HinoMoto/ryu + Claude)

Downloads last month: 18

Safetensors

Model size

1.0B params

Tensor type

BF16

Model tree for FiShota/hinomoto-1b-v1-phase2-full

Finetunes

2 models

FiShota
/

hinomoto-1b-v1-phase2-full