Model Card for Fox-1-1.6B

This model is a base pretrained model which requires further finetuning for most use cases. For a more interactive experience, we recommend tensoropera/Fox-1-1.6B-Instruct-v0.1, the instruction-tuned version of Fox-1.

Fox-1 is a decoder-only transformer-based small language model (SLM) with 1.6B total parameters developed by TensorOpera AI. The model was trained with a 3-stage data curriculum on 3 trillion tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and 16 attention heads for faster inference.

For the full details of this model please read Fox-1 technical report and release blog post.

Benchmarks

We evaluated Fox-1 on ARC Challenge (25-shot), HellaSwag (10-shot), TruthfulQA (0-shot), MMLU (5-shot), Winogrande (5-shot), and GSM8k (5-shot). We follow the Open LLM Leaderboard's evaluation setup and report the average score of the 6 benchmarks. The model was evaluated on a machine with 8*H100 GPUs.

Fox-1-1.6B Qwen-1.5-1.8B Gemma-2B StableLM-2-1.6B OpenELM-1.1B
GSM8k 36.39% 34.04% 17.06% 17.74% 2.27%
MMLU 43.05% 47.15% 41.71% 39.16% 27.28%
ARC Challenge 41.21% 37.20% 49.23% 44.11% 36.26%
HellaSwag 62.82% 61.55% 71.60% 70.46% 65.23%
TruthfulQA 38.66% 39.37% 33.05% 38.77% 36.98%
Winogrande 60.62% 65.51% 65.51% 65.27% 61.64%
Average 47.13% 46.81% 46.36% 45.92% 38.28%

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 7.69
IFEval (0-Shot) 27.66
BBH (3-Shot) 7.40
MATH Lvl 5 (4-Shot) 1.28
GPQA (0-shot) 1.79
MuSR (0-shot) 3.87
MMLU-PRO (5-shot) 4.13
Downloads last month
480
Safetensors
Model size
1.67B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tensoropera/Fox-1-1.6B

Finetunes
8 models
Quantizations
4 models

Space using tensoropera/Fox-1-1.6B 1

Evaluation results