pirroh commited on
Commit
8090ce5
1 Parent(s): 5d6823a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -54,9 +54,17 @@ Developed by: Replit, Inc.
54
  The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
55
  <br/>
56
  `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
57
-
58
  In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
59
 
 
 
 
 
 
 
 
 
60
  ## Intended Use
61
  Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.
62
 
 
54
  The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
55
  <br/>
56
  `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
57
+ <br/>
58
  In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
59
 
60
+ The model has been trained on the [MosaicML](https://www.mosaicml.com/) platform with 256 x A100-40GB GPUs, leveraging their latest [LLM examples repo](https://github.com/mosaicml/examples/tree/release/v0.0.4/examples/llm).
61
+ <br/>
62
+ `replit-code-v1-3b` is powered by state-of-the-art LLM techniques, such as:
63
+ [Flash Attention](https://arxiv.org/abs/2205.14135) for fast training and inference,
64
+ [AliBi positional embeddings](https://arxiv.org/abs/2108.12409) to support variable context length at inference time,
65
+ [LionW optimizer](https://arxiv.org/abs/2302.06675),
66
+ etc.
67
+
68
  ## Intended Use
69
  Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.
70