Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

1B-parameter models trained on Python-only datasets. In the different branches, models are trained on different versions of the Stack:

  • stack v1
  • stack v2 - permissive
  • stack v2 - permissive and unlicensed

24 layers, a hidden-size of 2048 and 16 attention heads (multiquery). The learning-rate is set to $4\times10^{-4}$ after a warmup of $1000$ steps and follows a cosine decay to $4\times10^{-5}$ at the end of training. Trained with a batch size of 128 samples of 8192 tokens each, for $100$k iterations, such that the model sees $100$B tokens at the end of training. We use a FIM-rate of $0.5$, the same tokenizer as StarCoder (except for tokenizer ablations) and learned absolute positional embeddings.

Downloads last month
0
Unable to determine this model's library. Check the docs .