RaymondLi commited on
Commit
1ac955a
1 Parent(s): 5fe27bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ 1B-parameter models trained on Python-only datasets. In the different branches, models are trained on different versions of the Stack:
2
+ - stack v1
3
+ - stack v2 - permissive
4
+ - stack v2 - permissive and unlicensed
5
+
6
+ 24 layers, a hidden-size of 2048 and 16 attention heads (multiquery).
7
+ The learning-rate is set to $4\times10^{-4}$ after a warmup of $1000$ steps and follows a cosine decay to $4\times10^{-5}$ at the end of training.
8
+ Trained with a batch size of 128 samples of 8192 tokens each, for $100$k iterations, such that the model sees $100$B tokens at the end of training.
9
+ We use a FIM-rate of $0.5$, the same tokenizer as StarCoder (except for tokenizer ablations) and learned absolute positional embeddings.