Update README.md
Browse files
README.md
CHANGED
@@ -1,74 +1,22 @@
|
|
1 |
-
|
2 |
-
base_model: jetmoe/jetmoe-8b-sft
|
3 |
-
tags:
|
4 |
-
- alignment-handbook
|
5 |
-
- generated_from_trainer
|
6 |
-
datasets:
|
7 |
-
- HuggingFaceH4/ultrafeedback_binarized
|
8 |
-
model-index:
|
9 |
-
- name: jetmoe-8b-chat
|
10 |
-
results: []
|
11 |
-
---
|
12 |
|
13 |
-
|
14 |
-
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
-
It
|
20 |
-
-
|
21 |
-
- Rewards/chosen: -0.0901
|
22 |
-
- Rewards/rejected: -0.2250
|
23 |
-
- Rewards/accuracies: 0.7148
|
24 |
-
- Rewards/margins: 0.1349
|
25 |
-
- Logps/rejected: -289.3396
|
26 |
-
- Logps/chosen: -286.2378
|
27 |
-
- Logits/rejected: -2.9020
|
28 |
-
- Logits/chosen: -2.9443
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
## Training procedure
|
43 |
-
|
44 |
-
### Training hyperparameters
|
45 |
-
|
46 |
-
The following hyperparameters were used during training:
|
47 |
-
- learning_rate: 5e-07
|
48 |
-
- train_batch_size: 4
|
49 |
-
- eval_batch_size: 8
|
50 |
-
- seed: 42
|
51 |
-
- distributed_type: multi-GPU
|
52 |
-
- num_devices: 8
|
53 |
-
- gradient_accumulation_steps: 4
|
54 |
-
- total_train_batch_size: 128
|
55 |
-
- total_eval_batch_size: 64
|
56 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
57 |
-
- lr_scheduler_type: cosine
|
58 |
-
- lr_scheduler_warmup_ratio: 0.1
|
59 |
-
- num_epochs: 1
|
60 |
-
|
61 |
-
### Training results
|
62 |
-
|
63 |
-
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
64 |
-
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
65 |
-
| 0.6664 | 0.42 | 200 | 0.6622 | -0.0185 | -0.0869 | 0.6997 | 0.0684 | -275.5274 | -279.0778 | -2.9127 | -2.9572 |
|
66 |
-
| 0.6428 | 0.84 | 400 | 0.6372 | -0.0901 | -0.2250 | 0.7148 | 0.1349 | -289.3396 | -286.2378 | -2.9020 | -2.9443 |
|
67 |
-
|
68 |
-
|
69 |
-
### Framework versions
|
70 |
-
|
71 |
-
- Transformers 4.39.0.dev0
|
72 |
-
- Pytorch 2.1.2
|
73 |
-
- Datasets 2.14.6
|
74 |
-
- Tokenizers 0.15.2
|
|
|
1 |
+
# JetMoE-8B-chat: Efficient and High-Performance LLM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
+
Welcome to the official repository of JetMoE-8B-chat, a language model that combines cost-efficiency with high performance, making state-of-the-art language modeling accessible to a broader audience, including academia and small-scale industry players.
|
|
|
4 |
|
5 |
+
## Key Highlights
|
6 |
|
7 |
+
- **Cost-Effective Training**: Achieved at less than $0.1 million, JetMoE-8B significantly lowers the barrier to entry for training large language models (LLMs), demonstrating that high-quality LLM training can be far more economical than widely assumed.
|
8 |
+
- **Academia-Friendly**: By relying exclusively on public datasets and open-sourcing our code, JetMoE-8B is highly accessible for educational and research purposes. It is designed to be fine-tuned even on consumer-grade GPUs, making it feasible for most academic labs.
|
9 |
+
- **Efficiency at Scale**: With only 2.2B active parameters during inference, JetMoE-8B provides an optimal balance between computational cost and performance, outperforming similarly sized models such as Gemma-2B across various benchmarks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
+
- JetMoE-8B-chat has been evaluated using the MT-Bench. Here is how JetMoE-8B-chat compares with other models:
|
12 |
|
13 |
+
| Model | Score |
|
14 |
+
|---------------------|-----------|
|
15 |
+
| GPT-4 | 9.014 |
|
16 |
+
| GPT-3.5-turbo | 7.995 |
|
17 |
+
| Claude-v1 | 7.923 |
|
18 |
+
| **JetMoE-8B-chat** | **6.681** |
|
19 |
+
| Llama-2-13b-chat | 6.650 |
|
20 |
+
| Vicuna-13b-v1.3 | 6.413 |
|
21 |
+
| Wizardlm-13b | 6.353 |
|
22 |
+
| Llama-2-7b-chat | 6.269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|