JunxiongWang commited on
Commit
0838de0
1 Parent(s): 55eafdf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -11
README.md CHANGED
@@ -4,17 +4,20 @@ license: apache-2.0
4
 
5
  Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
6
 
7
- | Task | Llama-3.2-3B-Instruct | Llama3.2-Mamba-3B-distill |
8
- |---------------|------------------------|--------------------------|
9
- | arc_challenge | 0.459 | 0.4838 |
10
- | arc_easy | 0.7407 | 0.7765 |
11
- | hellaswag | 0.7043 | 0.7037 |
12
- | mmlu | 0.6043 | 0.5448 |
13
- | openbookqa | 0.36 | 0.394 |
14
- | piqa | 0.7568 | 0.7731 |
15
- | pubmedqa | 0.696 | 0.664 |
16
- | race | 0.4067 | 0.4029 |
17
- | winogrande | 0.6748 | 0.6732 |
 
 
 
18
 
19
 
20
  ```
 
4
 
5
  Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
6
 
7
+ | Model | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama3.2-Mamba-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-distill) | [Llama3.2-Mamba-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-dpo) | [Llama3.2-Mamba2-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-distill) | [Llama3.2-Mamba2-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-dpo) |
8
+ |---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|
9
+ | Initialization Model | N/A | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct |
10
+ | Teacher Model | N/A | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct |
11
+ | arc_challenge | 0.459 | 0.4838 | 0.5265 | 0.4667 | 0.541 |
12
+ | arc_easy | 0.7407 | 0.7765 | 0.7997 | 0.7668 | 0.8026 |
13
+ | hellaswag | 0.7043 | 0.7037 | 0.7256 | 0.6913 | 0.7445 |
14
+ | mmlu | 0.6043 | 0.5448 | 0.5509 | 0.5312 | 0.5247 |
15
+ | openbookqa | 0.36 | 0.394 | 0.416 | 0.388 | 0.424 |
16
+ | piqa | 0.7568 | 0.7731 | 0.7731 | 0.7601 | 0.7769 |
17
+ | pubmedqa | 0.696 | 0.664 | 0.7 | 0.638 | 0.654 |
18
+ | race | 0.4067 | 0.4029 | 0.4364 | 0.3981 | 0.4344 |
19
+ | winogrande | 0.6748 | 0.6732 | 0.674 | 0.6606 | 0.6732 |
20
+ | truthfulqa | 0.3801 | 0.4202 | 0.4853 | 0.3478 | 0.5028 |
21
 
22
 
23
  ```