Text Generation
Transformers
Safetensors
GGUF
English
stablelm
causal-lm
code
Eval Results
Inference Endpoints
7 papers
ncoop57 commited on
Commit
5c62545
1 Parent(s): 40f4aad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -97,7 +97,6 @@ model-index:
97
  | Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
98
  | Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
99
 
100
-
101
  **Key Features**
102
  * Fill in Middle Capability (FIM)
103
  * Supports Long Context, trained with Sequences upto 16,384
@@ -207,6 +206,26 @@ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2
207
 
208
  The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
209
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
  ### Training Procedure
211
 
212
  The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.
 
97
  | Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
98
  | Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
99
 
 
100
  **Key Features**
101
  * Fill in Middle Capability (FIM)
102
  * Supports Long Context, trained with Sequences upto 16,384
 
206
 
207
  The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
208
 
209
+ Top 18 programming languages trained on:
210
+ - C
211
+ - CPP
212
+ - Java
213
+ - JavaScript
214
+ - CSS
215
+ - Go
216
+ - HTML
217
+ - Ruby
218
+ - Rust
219
+ - Markdown
220
+ - Shell
221
+ - Php
222
+ - Sql
223
+ - R
224
+ - Typescript
225
+ - Python
226
+ - Jupyter-Clean
227
+ - RestructuredText
228
+
229
  ### Training Procedure
230
 
231
  The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.