victormiller
commited on
Commit
•
887ace8
1
Parent(s):
8f9abff
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,24 @@ tags:
|
|
11 |
|
12 |
# CrystalChat
|
13 |
|
14 |
-
We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
| Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
|
17 |
|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
|
@@ -21,11 +38,19 @@ We present CrystalChat, an instruction following model finetuned from [LLM360/Cr
|
|
21 |
| Llama-2-7b-Chat | 2T | 34.11 | 52.86 | 15.35 | 53.07 | 78.39 | 48.42 | 18.88 | 73.09 | 45.30 | 13.26 | 17.43 |
|
22 |
| AmberChat 7B | 1.25T | - | 44.76 | - | 42.83 | 74.03 | 38.88 | 5.31 | 66.77 | 40.72 | - | - |
|
23 |
|
24 |
-
|||
|
25 |
-
<img src="CC-Compare.png" alt="arc" width="400"/>
|
26 |
|
27 |
-
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
## Model Description
|
31 |
|
@@ -57,6 +82,9 @@ print("-"*20 + "Output for model" + 20 * '-')
|
|
57 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
58 |
```
|
59 |
|
|
|
|
|
|
|
60 |
# Citation
|
61 |
|
62 |
**BibTeX:**
|
|
|
11 |
|
12 |
# CrystalChat
|
13 |
|
14 |
+
We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder). Following the release of [LLM360/AmberChat](https://huggingface.co/LLM360/AmberChat)and [LLM360/AmberSafe](https://huggingface.co/LLM360/AmberSafe) in December 2023, CrystalChat is the next and most performant chat model released under LLM360. CrystalChat is trained on a carefully selected mix publicly available language and code datasets.
|
15 |
+
|
16 |
+
As always, the training data, training code, and metrics are publicly available.
|
17 |
+
|
18 |
+
## About LLM360
|
19 |
+
LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
|
20 |
+
where all training details, model checkpoints, intermediate results, and
|
21 |
+
additional analyses are made available to the community. Our goal is to advance
|
22 |
+
the field by inviting the community to deepen the understanding of LLMs
|
23 |
+
together. As the first step of the project LLM360, we release all intermediate
|
24 |
+
model checkpoints, our fully-prepared pre-training dataset, all source code and
|
25 |
+
configurations, and training details. We are
|
26 |
+
committed to continually pushing the boundaries of LLMs through this open-source
|
27 |
+
effort.
|
28 |
+
|
29 |
+
Get access now at [LLM360 site](https://www.llm360.ai/)
|
30 |
+
|
31 |
+
# CrystalChat Performance
|
32 |
|
33 |
| Model | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. | ARC | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
|
34 |
|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
|
|
|
38 |
| Llama-2-7b-Chat | 2T | 34.11 | 52.86 | 15.35 | 53.07 | 78.39 | 48.42 | 18.88 | 73.09 | 45.30 | 13.26 | 17.43 |
|
39 |
| AmberChat 7B | 1.25T | - | 44.76 | - | 42.83 | 74.03 | 38.88 | 5.31 | 66.77 | 40.72 | - | - |
|
40 |
|
|
|
|
|
41 |
|
42 |
+
|
43 |
+
| Combined Language and Coding Ability |
|
44 |
+
|------------------------------------------------|
|
45 |
+
<img src="CC-Compare.jpg" alt="arc" width="800"/>
|
46 |
+
|
47 |
+
| Performance on Standard Benchmarks |
|
48 |
+
|------------------------------------------------|
|
49 |
+
<img src="cc-eval-std-benchmarks.png" alt="std-bench" width="600"/>
|
50 |
+
|
51 |
+
| Perforamnce on Language Benchmarks |
|
52 |
+
|---------------------------------------------------------|
|
53 |
+
<img src="cc-eval-lang-compare.png" alt="arc" width="600"/>
|
54 |
|
55 |
## Model Description
|
56 |
|
|
|
82 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
83 |
```
|
84 |
|
85 |
+
# Bias, Risks, and Limitations
|
86 |
+
CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.
|
87 |
+
|
88 |
# Citation
|
89 |
|
90 |
**BibTeX:**
|