Tianhua commited on
Commit
0cf2c90
·
verified ·
1 Parent(s): ea365a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -41,8 +41,6 @@ The instruction tuning data is a mix of publicly available language and code dat
41
  The summary of the instruction tuning data is as follows:
42
 
43
  <!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
44
-
45
- ## CrystalChat DataMix
46
  | Subset | Tokens (Million) |
47
  | ----------- | ----------- |
48
  | [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
@@ -61,6 +59,8 @@ The summary of the instruction tuning data is as follows:
61
 
62
  The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
63
 
 
 
64
  # Instruction Format
65
 
66
  We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.
 
41
  The summary of the instruction tuning data is as follows:
42
 
43
  <!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
 
 
44
  | Subset | Tokens (Million) |
45
  | ----------- | ----------- |
46
  | [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
 
59
 
60
  The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
61
 
62
+ For more details, check out the [data table](https://huggingface.co/LLM360/CrystalChat/blob/main/data_table.jpg).
63
+
64
  # Instruction Format
65
 
66
  We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.