Instructions to use dogtooth/open-lm-3b-201305-midtrain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dogtooth/open-lm-3b-201305-midtrain with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dogtooth/open-lm-3b-201305-midtrain", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("dogtooth/open-lm-3b-201305-midtrain", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dogtooth/open-lm-3b-201305-midtrain with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dogtooth/open-lm-3b-201305-midtrain" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dogtooth/open-lm-3b-201305-midtrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dogtooth/open-lm-3b-201305-midtrain
- SGLang
How to use dogtooth/open-lm-3b-201305-midtrain with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dogtooth/open-lm-3b-201305-midtrain" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dogtooth/open-lm-3b-201305-midtrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dogtooth/open-lm-3b-201305-midtrain" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dogtooth/open-lm-3b-201305-midtrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dogtooth/open-lm-3b-201305-midtrain with Docker Model Runner:
docker model run hf.co/dogtooth/open-lm-3b-201305-midtrain
Open LM 3B — Mid-Trained (Knowledge Cutoff May 2013)
Mid-training continuation of the Apple Open LM 3B oracle model with knowledge cutoff May 2013, from the TiC-LM (Time-Continual Language Modeling) / Chrononauts project.
The mid-training stage re-exposes the model to pre-cutoff facts drawn from peS2o, Wikipedia, and DCLM to consolidate (rather than extend) the model's knowledge. No post-cutoff text is included.
Trained with LLaMA-Factory
(finetuning_type: full, DeepSpeed ZeRO-2).
Model Details
| Property | Value |
|---|---|
| Base model | dogtooth/open-lm-3b-201305 |
| Architecture | LLaMA-style with QK norm (OpenLMForCausalLM, custom code) |
| Parameters | ~2.8B |
| Knowledge cutoff | May 2013 |
| Vocab size | 50,432 |
| Context length | 2,048 |
| Mid-train framework | LLaMA-Factory (full FT, DeepSpeed ZeRO-2) |
| Mid-train data | peS2o + Wikipedia + DCLM, pre-cutoff only |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"dogtooth/open-lm-3b-201305-midtrain",
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"dogtooth/open-lm-3b-201305-midtrain", trust_remote_code=True
)
Repository Contents
- Final model weights at the repo root (
model-*.safetensors) - Intermediate checkpoints in
checkpoint-14000/,checkpoint-16000/,checkpoint-16034/(HF-format weights only; DeepSpeed optimizer shards omitted) trainer_state.json,trainer_log.jsonl,all_results.json,train_results.json
Citation
@article{jain2024ticlm,
title={Time-Continual Learning from a Streaming Language Model},
author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
journal={arXiv preprint arXiv:2410.14660},
year={2024}
}
Mid-Training Data Recipe (201305 cutoff)
Three pre-cutoff text sources are concatenated (no upsampling), packed to a 2,048-token context, and trained for one epoch.
| Source | Time filter | Documents | Est. tokens |
|---|---|---|---|
| peS2o (academic abstracts/full text) | published before May 2013 | 1,859,534 | ~1.0 B |
| Wikipedia (English) | first-revision date before May 2013 | 3,966,112 | ~3.5 B |
| DCLM (Common Crawl, filtered) | none (assumed pre-cutoff web text) | 3,218,997 | ~4.5 B |
| Total | ~9.0 M docs | ~9.0 B |
Token estimates use a chars-per-token ratio of ~4 (verified ratios are ~0.21–0.23 tokens/char
with the OpenLM tokenizer; the table reports the 4-char approximation). See the project repo
for the per-cutoff data prep code (prepare_midtrain_data.py) and the slice statistics
(stats.json).
LLaMA-Factory dataset wiring
dataset: midtrain_pes2o_pre201305,midtrain_wiki_pre201305,midtrain_dclm
template: empty
cutoff_len: 2048
mix_strategy: concat
Per-source files (relative to the dataset root):
midtrain/pes2o_slices/pes2o_pre201305_1b.jsonlmidtrain/wiki_slices/wiki_pre201305.jsonlmidtrain/dclm_4_5b.jsonl
All three are jsonl with a single text column.
Training hyperparameters
| Hyperparameter | Value |
|---|---|
| Framework | LLaMA-Factory stage: pt, finetuning_type: full |
| Optimizer | DeepSpeed ZeRO-2 |
| Precision | bf16 |
| GPUs | 4 × H200 |
| Per-device batch | 64 |
| Gradient accumulation | 1 |
| Effective batch (tokens) | 4 × 64 × 2048 ≈ 524,288 / step |
| Learning rate | 5.0e-5, cosine schedule, 3% warmup |
| Epochs | 1.0 |
| Total optimizer steps | 16,034 |
| Tokens consumed | ~8.4 B (≈ 1 pass over the corpus) |
Why mid-train?
The mid-training stage re-exposes the model to pre-cutoff facts drawn from peS2o, Wikipedia, and DCLM to consolidate (rather than extend) the model's knowledge. No post-cutoff text is included, so the knowledge cutoff date is preserved while the representation of pre-cutoff knowledge is strengthened.
- Downloads last month
- 44