replit
/

replit-code-v1-3b

Text Generation

text-generation-inference

Model card Files Files and versions Community

pirroh commited on May 3, 2023

Commit

8090ce5

•

1 Parent(s): 5d6823a

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -54,9 +54,17 @@ Developed by: Replit, Inc.
 The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
 <br/>
 `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
 In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
 ## Intended Use
 Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.

 The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
 <br/>
 `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
+<br/>
 In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
+The model has been trained on the [MosaicML](https://www.mosaicml.com/) platform with 256 x A100-40GB GPUs, leveraging their latest [LLM examples repo](https://github.com/mosaicml/examples/tree/release/v0.0.4/examples/llm).
+<br/>
+`replit-code-v1-3b` is powered by state-of-the-art LLM techniques, such as:
+[Flash Attention](https://arxiv.org/abs/2205.14135) for fast training and inference,
+[AliBi positional embeddings](https://arxiv.org/abs/2108.12409) to support variable context length at inference time,
+[LionW optimizer](https://arxiv.org/abs/2302.06675),
+etc.
 ## Intended Use
 Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.