victormiller commited on
Commit
36ae419
1 Parent(s): 913e970

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -99,6 +99,23 @@ gen_tokens = model.generate(input_ids, do_sample=True, max_length=400)
99
  print("-"*20 + "Output for model" + 20 * '-')
100
  print(tokenizer.batch_decode(gen_tokens)[0])
101
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  # Evaluation
103
 
104
  Coming Soon!
 
99
  print("-"*20 + "Output for model" + 20 * '-')
100
  print(tokenizer.batch_decode(gen_tokens)[0])
101
  ```
102
+ ## CrystalChat DataMix
103
+ | Subset | Tokens (Billion) |
104
+ | ----------- | ----------- |
105
+ | OASST1-guanaco | 4.46 |
106
+ | SlimOrca | 225.63 |
107
+ | ShareGPT | 112.91 |
108
+ | Evol-ShareGPT | 85.95 |
109
+ | ChatLogs | 29.34 |
110
+ | CodeAlpaca | 2.62 |
111
+ | Rosetta Code | 7.99 |
112
+ | Evol-CodeAlpaca 1 | 73.80 |
113
+ | Evol-CodeAlpaca 2 | 34.91 |
114
+ | HTML Instruction | 43.67 |
115
+ | General Textbooks | 85.59 |
116
+ | Programming Books | 395.63 |
117
+ | Total | 1102.52 |
118
+
119
  # Evaluation
120
 
121
  Coming Soon!