LLM360
/

CrystalChat

Text Generation

Model card Files Files and versions Community

Tianhua commited on Jan 11

Commit

f692d0d

•

1 Parent(s): 36ae419

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -99,7 +99,7 @@ gen_tokens = model.generate(input_ids, do_sample=True, max_length=400)
 print("-"*20 + "Output for model"  + 20 * '-')
 print(tokenizer.batch_decode(gen_tokens)[0])
 ```
-## CrystalChat DataMix
 | Subset      | Tokens (Billion) |
 | ----------- | ----------- |
 | OASST1-guanaco      | 4.46       |
@@ -114,13 +114,12 @@ print(tokenizer.batch_decode(gen_tokens)[0])
 | HTML Instruction   | 43.67        |
 | General Textbooks   | 85.59        |
 | Programming Books   | 395.63        |
-| Total | 1102.52 |
 # Evaluation
 Coming Soon!
 # Bias, Risks, and Limitations
 CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.

 print("-"*20 + "Output for model"  + 20 * '-')
 print(tokenizer.batch_decode(gen_tokens)[0])
 ```
+<!-- ## CrystalChat DataMix
 | Subset      | Tokens (Billion) |
 | ----------- | ----------- |
 | OASST1-guanaco      | 4.46       |
 | HTML Instruction   | 43.67        |
 | General Textbooks   | 85.59        |
 | Programming Books   | 395.63        |
+| Total | 1102.52 | -->
 # Evaluation
 Coming Soon!
 # Bias, Risks, and Limitations
 CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.