File size: 1,543 Bytes

---
license: other
datasets:
- JeanKaddour/minipile
language:
- en
---

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).

Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:


|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.7319|±  |0.0034|
| - humanities     |N/A    |none  |     0|acc   |0.6582|±  |0.0063|
| - other          |N/A    |none  |     0|acc   |0.7927|±  |0.0069|
| - social_sciences|N/A    |none  |     0|acc   |0.8466|±  |0.0064|
| - stem           |N/A    |none  |     0|acc   |0.6702|±  |0.0079|

5-shot:

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.7669|±  |0.0034|
| - humanities     |N/A    |none  |     5|acc   |0.7296|±  |0.0062|
| - other          |N/A    |none  |     5|acc   |0.8101|±  |0.0067|
| - social_sciences|N/A    |none  |     5|acc   |0.8668|±  |0.0060|
| - stem           |N/A    |none  |     5|acc   |0.6825|±  |0.0079|