victormiller
commited on
Commit
•
36ae419
1
Parent(s):
913e970
Update README.md
Browse files
README.md
CHANGED
@@ -99,6 +99,23 @@ gen_tokens = model.generate(input_ids, do_sample=True, max_length=400)
|
|
99 |
print("-"*20 + "Output for model" + 20 * '-')
|
100 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
101 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
# Evaluation
|
103 |
|
104 |
Coming Soon!
|
|
|
99 |
print("-"*20 + "Output for model" + 20 * '-')
|
100 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
101 |
```
|
102 |
+
## CrystalChat DataMix
|
103 |
+
| Subset | Tokens (Billion) |
|
104 |
+
| ----------- | ----------- |
|
105 |
+
| OASST1-guanaco | 4.46 |
|
106 |
+
| SlimOrca | 225.63 |
|
107 |
+
| ShareGPT | 112.91 |
|
108 |
+
| Evol-ShareGPT | 85.95 |
|
109 |
+
| ChatLogs | 29.34 |
|
110 |
+
| CodeAlpaca | 2.62 |
|
111 |
+
| Rosetta Code | 7.99 |
|
112 |
+
| Evol-CodeAlpaca 1 | 73.80 |
|
113 |
+
| Evol-CodeAlpaca 2 | 34.91 |
|
114 |
+
| HTML Instruction | 43.67 |
|
115 |
+
| General Textbooks | 85.59 |
|
116 |
+
| Programming Books | 395.63 |
|
117 |
+
| Total | 1102.52 |
|
118 |
+
|
119 |
# Evaluation
|
120 |
|
121 |
Coming Soon!
|