Edit model card
Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers. Post-pruning trained using QLoRA for ~100M tokens from JeanKaddour/minipile.

Layers to prune selected using PruneMe.

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7319 ± 0.0034
- humanities N/A none 0 acc 0.6582 ± 0.0063
- other N/A none 0 acc 0.7927 ± 0.0069
- social_sciences N/A none 0 acc 0.8466 ± 0.0064
- stem N/A none 0 acc 0.6702 ± 0.0079

5-shot:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7669 ± 0.0034
- humanities N/A none 5 acc 0.7296 ± 0.0062
- other N/A none 5 acc 0.8101 ± 0.0067
- social_sciences N/A none 5 acc 0.8668 ± 0.0060
- stem N/A none 5 acc 0.6825 ± 0.0079

Built with Axolotl

Downloads last month
2

Dataset used to train lucyknada/chargoddard_llama3-42b-v0-3.8bpw-EXL2