victormiller commited on
Commit
887ace8
1 Parent(s): 8f9abff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -11,7 +11,24 @@ tags:
11
 
12
  # CrystalChat
13
 
14
- We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder). Here's a comparison table for some popular chat models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  | Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
17
  |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
@@ -21,11 +38,19 @@ We present CrystalChat, an instruction following model finetuned from [LLM360/Cr
21
  | Llama-2-7b-Chat | 2T | 34.11 | 52.86 | 15.35 | 53.07 | 78.39 | 48.42 | 18.88 | 73.09 | 45.30 | 13.26 | 17.43 |
22
  | AmberChat 7B | 1.25T | - | 44.76 | - | 42.83 | 74.03 | 38.88 | 5.31 | 66.77 | 40.72 | - | - |
23
 
24
- |||
25
- <img src="CC-Compare.png" alt="arc" width="400"/>
26
 
27
- |:--|:--|
28
- |<img src="cc-eval-std-benchmarks.png" alt="arc" width="400"/> |<img src="cc-eval-lang-compare.png" alt="arc" width="400"/>
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Model Description
31
 
@@ -57,6 +82,9 @@ print("-"*20 + "Output for model" + 20 * '-')
57
  print(tokenizer.batch_decode(gen_tokens)[0])
58
  ```
59
 
 
 
 
60
  # Citation
61
 
62
  **BibTeX:**
 
11
 
12
  # CrystalChat
13
 
14
+ We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder). Following the release of [LLM360/AmberChat](https://huggingface.co/LLM360/AmberChat)and [LLM360/AmberSafe](https://huggingface.co/LLM360/AmberSafe) in December 2023, CrystalChat is the next and most performant chat model released under LLM360. CrystalChat is trained on a carefully selected mix publicly available language and code datasets.
15
+
16
+ As always, the training data, training code, and metrics are publicly available.
17
+
18
+ ## About LLM360
19
+ LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
20
+ where all training details, model checkpoints, intermediate results, and
21
+ additional analyses are made available to the community. Our goal is to advance
22
+ the field by inviting the community to deepen the understanding of LLMs
23
+ together. As the first step of the project LLM360, we release all intermediate
24
+ model checkpoints, our fully-prepared pre-training dataset, all source code and
25
+ configurations, and training details. We are
26
+ committed to continually pushing the boundaries of LLMs through this open-source
27
+ effort.
28
+
29
+ Get access now at [LLM360 site](https://www.llm360.ai/)
30
+
31
+ # CrystalChat Performance
32
 
33
  | Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
34
  |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
 
38
  | Llama-2-7b-Chat | 2T | 34.11 | 52.86 | 15.35 | 53.07 | 78.39 | 48.42 | 18.88 | 73.09 | 45.30 | 13.26 | 17.43 |
39
  | AmberChat 7B | 1.25T | - | 44.76 | - | 42.83 | 74.03 | 38.88 | 5.31 | 66.77 | 40.72 | - | - |
40
 
 
 
41
 
42
+
43
+ | Combined Language and Coding Ability |
44
+ |------------------------------------------------|
45
+ <img src="CC-Compare.jpg" alt="arc" width="800"/>
46
+
47
+ | Performance on Standard Benchmarks |
48
+ |------------------------------------------------|
49
+ <img src="cc-eval-std-benchmarks.png" alt="std-bench" width="600"/>
50
+
51
+ | Perforamnce on Language Benchmarks |
52
+ |---------------------------------------------------------|
53
+ <img src="cc-eval-lang-compare.png" alt="arc" width="600"/>
54
 
55
  ## Model Description
56
 
 
82
  print(tokenizer.batch_decode(gen_tokens)[0])
83
  ```
84
 
85
+ # Bias, Risks, and Limitations
86
+ CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.
87
+
88
  # Citation
89
 
90
  **BibTeX:**