Needle in a Haystack Evaluation Heatmap
Model Card for Model ID
merge between:
- DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 - 66%
- meta-llama/Meta-Llama-3-8B-Instruct - 16%
- DataGuard/pali-8B-v0.4.3 - 16%
Embedding, norm and head layers come from DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 without changes
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- judge_match on squad_answerableself-reported0.639
- judge_match on context_has_answerself-reported0.86
- judge_match on jail_breakself-reported0.099
- judge_match on harmless_promptself-reported0.926
- judge_match on harmful_promptself-reported0.689
- acc on truthfulqaself-reported0.522
- exact_match on gsm8kself-reported0.616
- acc on mmluself-reported0.634