M4-ai
/

NeuralReyna-Mini-1.8B-v0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

NeuralReyna-Mini-1.8B-v0.2

Description

Taken aloobun/Reyna-Mini-1.8B-v0.2 and further fine-tuned it using DPO using the Intel/orca_dpo_pairs dataset.

This model has capabilities in coding, math, science, roleplay, and function calling.

This model was trained on OpenAI's ChatML prompt format.

Evaluation

AGIEval:

GPT4ALL:

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	1	none	0	acc	0.3208	±	0.0136
		none	0	acc_norm	0.3336	±	0.0138
arc_easy	1	none	0	acc	0.6035	±	0.0100
		none	0	acc_norm	0.5833	±	0.0101
boolq	2	none	0	acc	0.6526	±	0.0083
hellaswag	1	none	0	acc	0.4556	±	0.0050
		none	0	acc_norm	0.6076	±	0.0049
openbookqa	1	none	0	acc	0.2600	±	0.0196
		none	0	acc_norm	0.3460	±	0.0213
piqa	1	none	0	acc	0.7236	±	0.0104
		none	0	acc_norm	0.7307	±	0.0104
winogrande	1	none	0	acc	0.6062	±	0.0137

Disclaimer

This model may have overfitted to the DPO training data, and may not perform well.

Contributions

Thanks to @aloobun and @Locutusque for their contributions to this model.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	44.85
AI2 Reasoning Challenge (25-Shot)	37.80
HellaSwag (10-Shot)	60.51
MMLU (5-Shot)	45.04
TruthfulQA (0-shot)	37.75
Winogrande (5-shot)	60.93
GSM8k (5-shot)	27.07

Downloads last month: 389

Safetensors

Model size

1.84B params

Tensor type

FP16

·

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train M4-ai/NeuralReyna-Mini-1.8B-v0.2

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

37.800
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

60.510
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

45.040
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

37.750
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

60.930
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

27.070

View on Papers With Code