OLMo3-190M-zh-full

零基础 AI 大模型研发训练营(llm001)L04 Full 模型(190M 参数)。

模型配置

  • hidden_size: 768, num_layers: 12, num_heads: 12, intermediate_size: 3072
  • vocab_size: 48000, sliding_window: 4096

训练配置

  • 数据:cmz1024/llm101-olmo3-zh-demo-data(tokenized.bin,6.37 GB,约 3.18B tokens)
  • 训练:H100,max_steps=3000(约 23% epoch),bs=16×8=128,lr=5e-4,bf16
  • eval_loss(step 3000):4.148

用法

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("pusu26/olmo3-190m-zh-full")
tok = AutoTokenizer.from_pretrained("pusu26/olmo3-190m-zh-full")
Downloads last month
37
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pusu26/olmo3-190m-zh-full

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using pusu26/olmo3-190m-zh-full 1