zguo0525 commited on
Commit
2c07cbf
1 Parent(s): aaad630

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -69
README.md CHANGED
@@ -1,74 +1,22 @@
1
- ---
2
- base_model: jetmoe/jetmoe-8b-sft
3
- tags:
4
- - alignment-handbook
5
- - generated_from_trainer
6
- datasets:
7
- - HuggingFaceH4/ultrafeedback_binarized
8
- model-index:
9
- - name: jetmoe-8b-chat
10
- results: []
11
- ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # jetmoe-8b-chat
17
 
18
- This model is a fine-tuned version of [jetmoe-8b-sft](https://huggingface.co/jetmoe/jetmoe-8b-sft) on the HuggingFaceH4/ultrafeedback_binarized dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.6372
21
- - Rewards/chosen: -0.0901
22
- - Rewards/rejected: -0.2250
23
- - Rewards/accuracies: 0.7148
24
- - Rewards/margins: 0.1349
25
- - Logps/rejected: -289.3396
26
- - Logps/chosen: -286.2378
27
- - Logits/rejected: -2.9020
28
- - Logits/chosen: -2.9443
29
 
30
- ## Model description
31
 
32
- More information needed
33
-
34
- ## Intended uses & limitations
35
-
36
- More information needed
37
-
38
- ## Training and evaluation data
39
-
40
- More information needed
41
-
42
- ## Training procedure
43
-
44
- ### Training hyperparameters
45
-
46
- The following hyperparameters were used during training:
47
- - learning_rate: 5e-07
48
- - train_batch_size: 4
49
- - eval_batch_size: 8
50
- - seed: 42
51
- - distributed_type: multi-GPU
52
- - num_devices: 8
53
- - gradient_accumulation_steps: 4
54
- - total_train_batch_size: 128
55
- - total_eval_batch_size: 64
56
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
- - lr_scheduler_type: cosine
58
- - lr_scheduler_warmup_ratio: 0.1
59
- - num_epochs: 1
60
-
61
- ### Training results
62
-
63
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
- |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
- | 0.6664 | 0.42 | 200 | 0.6622 | -0.0185 | -0.0869 | 0.6997 | 0.0684 | -275.5274 | -279.0778 | -2.9127 | -2.9572 |
66
- | 0.6428 | 0.84 | 400 | 0.6372 | -0.0901 | -0.2250 | 0.7148 | 0.1349 | -289.3396 | -286.2378 | -2.9020 | -2.9443 |
67
-
68
-
69
- ### Framework versions
70
-
71
- - Transformers 4.39.0.dev0
72
- - Pytorch 2.1.2
73
- - Datasets 2.14.6
74
- - Tokenizers 0.15.2
 
1
+ # JetMoE-8B-chat: Efficient and High-Performance LLM
 
 
 
 
 
 
 
 
 
 
2
 
3
+ Welcome to the official repository of JetMoE-8B-chat, a language model that combines cost-efficiency with high performance, making state-of-the-art language modeling accessible to a broader audience, including academia and small-scale industry players.
 
4
 
5
+ ## Key Highlights
6
 
7
+ - **Cost-Effective Training**: Achieved at less than $0.1 million, JetMoE-8B significantly lowers the barrier to entry for training large language models (LLMs), demonstrating that high-quality LLM training can be far more economical than widely assumed.
8
+ - **Academia-Friendly**: By relying exclusively on public datasets and open-sourcing our code, JetMoE-8B is highly accessible for educational and research purposes. It is designed to be fine-tuned even on consumer-grade GPUs, making it feasible for most academic labs.
9
+ - **Efficiency at Scale**: With only 2.2B active parameters during inference, JetMoE-8B provides an optimal balance between computational cost and performance, outperforming similarly sized models such as Gemma-2B across various benchmarks.
 
 
 
 
 
 
 
 
10
 
11
+ - JetMoE-8B-chat has been evaluated using the MT-Bench. Here is how JetMoE-8B-chat compares with other models:
12
 
13
+ | Model | Score |
14
+ |---------------------|-----------|
15
+ | GPT-4 | 9.014 |
16
+ | GPT-3.5-turbo | 7.995 |
17
+ | Claude-v1 | 7.923 |
18
+ | **JetMoE-8B-chat** | **6.681** |
19
+ | Llama-2-13b-chat | 6.650 |
20
+ | Vicuna-13b-v1.3 | 6.413 |
21
+ | Wizardlm-13b | 6.353 |
22
+ | Llama-2-7b-chat | 6.269 |