woohello/olmo3-190m-zh-nano-continue

持续预训练版本：基于 woohello/olmo3-190m-zh-nano 继续训练，学习 Wikipedia-zh 等新语料，在保持原有知识基础上扩展能力。

训练配置

Base model: woohello/olmo3-190m-zh-nano (26M, OLMo3 arch, SDPA)
数据：42ailab/llm101-v3.1-data/tokenized/full_v31.bin (Wikipedia-zh 继续)
LR: 2e-4（比 pretrain 1e-3 低 5x，防止灾难性遗忘）
Warmup: **10%**（比 pretrain 2% 长 5x，平滑过渡）
训练: RTX 3090 (24GB), bf16, attn_implementation=sdpa

用法

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("woohello/olmo3-190m-zh-nano-continue", attn_implementation="sdpa")
tok = AutoTokenizer.from_pretrained("woohello/olmo3-190m-zh-nano-continue")

input_ids = tok("从前有座山，山里有座庙，", return_tensors="pt").input_ids
out = model.generate(input_ids, max_new_tokens=100, do_sample=True, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))

Downloads last month: 8

Safetensors

Model size

22M params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for woohello/olmo3-190m-zh-nano-continue

Unable to build the model tree, the base model loops to the model itself. Learn more.

woohello
/

olmo3-190m-zh-nano-continue

woohello/olmo3-190m-zh-nano-continue

训练配置

用法

Model tree for woohello/olmo3-190m-zh-nano-continue

Space using woohello/olmo3-190m-zh-nano-continue 1