OLMo3-190M-zh-nano

为零基础 AI 大模型研发训练营（llm001）L04 NANO 模型（26M 参数，1 epoch 本地 RTX 3090 训练）。

模型配置

hidden_size: 192
num_layers: 6
num_heads: 3
intermediate_size: 768
vocab_size: 48000
sliding_window: 4096
QK-Norm, RoPE (base=500000), SiLU FFN

训练配置

数据：cmz1024/llm101-olmo3-zh-demo-data
训练：RTX 3090 (24GB), bf16, SDPA, attn_implementation=sdpa
1 epoch, bs=8×ga=16=128 eff
lr=0.001, cosine, warmup=2%
仓库：woohello/olmo3-190m-zh-nano

用法

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("woohello/olmo3-190m-zh-nano", attn_implementation="sdpa")
tok = AutoTokenizer.from_pretrained("woohello/olmo3-190m-zh-nano")

input_ids = tok("从前有座山，山里有座庙，", return_tensors="pt").input_ids
output = model.generate(input_ids, max_new_tokens=100, do_sample=True, temperature=0.8)
print(tok.decode(output[0], skip_special_tokens=True))

Downloads last month: 34

Safetensors

Model size

22M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for woohello/olmo3-190m-zh-nano

Unable to build the model tree, the base model loops to the model itself. Learn more.

woohello
/

olmo3-190m-zh-nano

OLMo3-190M-zh-nano

模型配置

训练配置

用法

Model tree for woohello/olmo3-190m-zh-nano

Space using woohello/olmo3-190m-zh-nano 1