chargoddard commited on
Commit
d9acebe
1 Parent(s): ae96f79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -12
README.md CHANGED
@@ -9,23 +9,19 @@ tags:
9
  - mergekit
10
  - llama
11
  ---
 
 
 
 
 
 
 
12
 
13
  Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).
14
 
15
  Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).
16
 
17
- Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:
18
-
19
-
20
- | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
21
- |------------------|-------|------|-----:|------|-----:|---|-----:|
22
- |mmlu |N/A |none | 0|acc |0.7319|± |0.0034|
23
- | - humanities |N/A |none | 0|acc |0.6582|± |0.0063|
24
- | - other |N/A |none | 0|acc |0.7927|± |0.0069|
25
- | - social_sciences|N/A |none | 0|acc |0.8466|± |0.0064|
26
- | - stem |N/A |none | 0|acc |0.6702|± |0.0079|
27
-
28
- 5-shot:
29
 
30
  | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
31
  |------------------|-------|------|-----:|------|-----:|---|-----:|
@@ -34,5 +30,8 @@ Still evaluating, don't get too excited! Might be incredibly dumb. Check out the
34
  | - other |N/A |none | 5|acc |0.8101|± |0.0067|
35
  | - social_sciences|N/A |none | 5|acc |0.8668|± |0.0060|
36
  | - stem |N/A |none | 5|acc |0.6825|± |0.0079|
 
 
 
37
 
38
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 
9
  - mergekit
10
  - llama
11
  ---
12
+ > 🚨 THIS IS A BASE MODEL 🚨
13
+ >
14
+ > This model is pruned from the base Llama 3 70B, which has no instruction tuning and randomly initialized special tokens.
15
+ >
16
+ > Using this with the Llama 3 instruction format is injecting random noise into latent space and will give you deranged results. (It's pretty funny actually.)
17
+ > Treat this as the untrained foundation model this is and use appropriate prompts.
18
+
19
 
20
  Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).
21
 
22
  Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).
23
 
24
+ Still evaluating, don't get too excited! Might be incredibly dumb. Check out these numbers though:
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
27
  |------------------|-------|------|-----:|------|-----:|---|-----:|
 
30
  | - other |N/A |none | 5|acc |0.8101|± |0.0067|
31
  | - social_sciences|N/A |none | 5|acc |0.8668|± |0.0060|
32
  | - stem |N/A |none | 5|acc |0.6825|± |0.0079|
33
+ |winogrande| 1|none | 5|acc |0.8027|± |0.0112|
34
+ |hellaswag| 1|none | 10|acc_norm|0.8025|± |0.0040|
35
+
36
 
37
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)