chargoddard
/

llama3-42b-v0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama3-42b-v0 / README.md

chargoddard's picture

Update README.md

da911da verified 5 months ago

|

No virus

1.54 kB

	---
	license: other
	datasets:
	- JeanKaddour/minipile
	language:
	- en
	---

	Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).

	Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).

	Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:


	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.7319\|± \|0.0034\|
	\| - humanities \|N/A \|none \| 0\|acc \|0.6582\|± \|0.0063\|
	\| - other \|N/A \|none \| 0\|acc \|0.7927\|± \|0.0069\|
	\| - social_sciences\|N/A \|none \| 0\|acc \|0.8466\|± \|0.0064\|
	\| - stem \|N/A \|none \| 0\|acc \|0.6702\|± \|0.0079\|

	5-shot:

	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.7669\|± \|0.0034\|
	\| - humanities \|N/A \|none \| 5\|acc \|0.7296\|± \|0.0062\|
	\| - other \|N/A \|none \| 5\|acc \|0.8101\|± \|0.0067\|
	\| - social_sciences\|N/A \|none \| 5\|acc \|0.8668\|± \|0.0060\|
	\| - stem \|N/A \|none \| 5\|acc \|0.6825\|± \|0.0079\|