Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
conversational
custom_code
text-generation-inference
sam-mosaic commited on
Commit
7155548
1 Parent(s): ebebfbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -173,6 +173,21 @@ The model has been modified from a standard transformer in the following ways:
173
  | vocab size | 50432 |
174
  | sequence length | 8192 |
175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
  ### Training Configuration
177
 
178
  This model was trained on 64 H100s for about 7.6 hours using the [MosaicML Platform](https://www.mosaicml.com/platform).
 
173
  | vocab size | 50432 |
174
  | sequence length | 8192 |
175
 
176
+ ## Data Mix
177
+
178
+ The model was trained on the following data mix:
179
+
180
+ | Data Source | Number of Tokens in Source | Proportion |
181
+ |-------------|----------------------------|------------|
182
+ | Airoboros/GPT4 | 26.4M | 1.71% |
183
+ | Baize | 55.0M | 3.57% |
184
+ | Camel | 301M | 19.54% |
185
+ | GPTeacher | 7.56M | 0.49% |
186
+ | Guanaco | 15.6M | 1.02% |
187
+ | LongCoversations | 18.4M | 1.19% |
188
+ | ShareGPT | 821M | 53.24% |
189
+ | WizardLM | 297M | 19.23% |
190
+
191
  ### Training Configuration
192
 
193
  This model was trained on 64 H100s for about 7.6 hours using the [MosaicML Platform](https://www.mosaicml.com/platform).