Update README.md
Browse files
README.md
CHANGED
@@ -41,8 +41,6 @@ The instruction tuning data is a mix of publicly available language and code dat
|
|
41 |
The summary of the instruction tuning data is as follows:
|
42 |
|
43 |
<!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
|
44 |
-
|
45 |
-
## CrystalChat DataMix
|
46 |
| Subset | Tokens (Million) |
|
47 |
| ----------- | ----------- |
|
48 |
| [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
|
@@ -61,6 +59,8 @@ The summary of the instruction tuning data is as follows:
|
|
61 |
|
62 |
The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
|
63 |
|
|
|
|
|
64 |
# Instruction Format
|
65 |
|
66 |
We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.
|
|
|
41 |
The summary of the instruction tuning data is as follows:
|
42 |
|
43 |
<!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
|
|
|
|
|
44 |
| Subset | Tokens (Million) |
|
45 |
| ----------- | ----------- |
|
46 |
| [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
|
|
|
59 |
|
60 |
The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
|
61 |
|
62 |
+
For more details, check out the [data table](https://huggingface.co/LLM360/CrystalChat/blob/main/data_table.jpg).
|
63 |
+
|
64 |
# Instruction Format
|
65 |
|
66 |
We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.
|