Text Generation
Transformers
Safetensors
7 languages
stablelm
causal-lm
Inference Endpoints
12 papers
stablelm-2-1_6b / README.md
jon-tow's picture
Update README.md
f12831a verified
metadata
license: other
language:
  - en
tags:
  - causal-lm

Stable LM 2 1.6B (global_step420000)

Description

Stable LM 2 1.6B is a 1.6 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs.

Usage

This branch contains the training checkpoint for Stable LM 2 1.6B at step 420,000. It is the final checkpoint taken before cooldown. We provide the following contents in the global_step420000 directory:

  • bf16_zero_pp_mp_rank_00_optim_states.pt: The Adam states and FP32 weights for each parameter. You will need to port this to your optimizer format when importing into your training process.

  • mp_rank_00_model_states.pt: The model weights following the GPT-NeoX convention.

  • config.yml: The pre-training configuration file for this checkpoint. Linear learning rate cooldown should be taken from lr=0.0002529 to lr=0.0.

The model weights are also converted to HuggingFace transformers format and can be loaded with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-2-1_6b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablelm-2-1_6b",
  trust_remote_code=True,
  torch_dtype="auto",
  revision="global_step420000"
)
model.cuda()

License

Acknowledgements

  • Dakota Mahan for creating the ZeRO optimizer state merging script.

Citation

@misc{StableLM-2-1.6B,
      url={[https://huggingface.co/stabilityai/stablelm-2-1_6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)},
      title={Stable LM 2 1.6B},
      author={Stability AI Language Team}
}