chargoddard
/

llama3-42b-v0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chargoddard commited on Apr 21, 2024

Commit

5af5d78

·

verified ·

1 Parent(s): e4b5496

Create README.md

Files changed (1) hide show

README.md +22 -0

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+license: llama2
+datasets:
+- JeanKaddour/minipile
+language:
+- en
+---
+Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).
+Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).
+Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:
+|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
+|------------------|-------|------|-----:|------|-----:|---|-----:|
+|mmlu              |N/A    |none  |     0|acc   |0.7319|±  |0.0034|
+| - humanities     |N/A    |none  |     0|acc   |0.6582|±  |0.0063|
+| - other          |N/A    |none  |     0|acc   |0.7927|±  |0.0069|
+| - social_sciences|N/A    |none  |     0|acc   |0.8466|±  |0.0064|
+| - stem           |N/A    |none  |     0|acc   |0.6702|±  |0.0079|