license: other
language:
- en
tags:
- causal-lm
Stable LM 2 1.6B
(global_step420000)
Description
Stable LM 2 1.6B
is a 1.6 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs.
Usage
This branch contains the training checkpoint for Stable LM 2 1.6B
at step 420,000. It is the final checkpoint taken before cooldown.
We provide the following contents in the global_step420000
directory:
bf16_zero_pp_mp_rank_00_optim_states.pt
: The Adam states and FP32 weights for each parameter. You will need to port this to your optimizer format when importing into your training process.mp_rank_00_model_states.pt
: The model weights following the GPT-NeoX convention.config.yml
: The pre-training configuration file for this checkpoint. Linear learning rate cooldown should be taken fromlr=0.0002529
tolr=0.0
.
The model weights are also converted to HuggingFace transformers
format and can be loaded with the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-2-1_6b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"stabilityai/stablelm-2-1_6b",
trust_remote_code=True,
torch_dtype="auto",
revision="global_step420000"
)
model.cuda()
License
- License: Stability AI Non-Commercial Research Community License. If you'd like to use this model for commercial products or purposes, please contact us here to learn more.
Acknowledgements
- Dakota Mahan for creating the ZeRO optimizer state merging script.
Citation
@misc{StableLM-2-1.6B,
url={[https://huggingface.co/stabilityai/stablelm-2-1_6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)},
title={Stable LM 2 1.6B},
author={Stability AI Language Team}
}