Model Card for Fox-1-1.6B-Instruct

This model is an instruction tuned model which requires alignment before it can be used in production. We will release the chat version soon.

Fox-1 is a decoder-only transformer-based small language model (SLM) with 1.6B total parameters developed by TensorOpera AI. The model was pre-trained with a 3-stage data curriculum on 3 trillion tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and 16 attention heads for faster inference.

Fox-1-Instruct-v0.1 is an instruction-tuned (SFT) version of Fox-1-1.6B that has an 8K native context length. The model was finetuned with 5B tokens of instruction following and multi-turn conversation data.

For the full details of this model please read Fox-1 technical report and release blog post.

Getting-Started

The model and a live inference endpoint are available on the TensorOpera AI Platform.

For detailed deployment instructions, refer to the Step-by-Step Guide on how to deploy Fox-1-Instruct on the TensorOpera AI Platform.

Benchmarks

We evaluated Fox-1 on ARC Challenge (25-shot), HellaSwag (10-shot), TruthfulQA (0-shot), MMLU (5-shot), Winogrande (5-shot), and GSM8k (5-shot). We follow the Open LLM Leaderboard's evaluation setup and report the average score of the 6 benchmarks. The model was evaluated on a machine with 8*H100 GPUs.

Fox-1-1.6B-Instruct-v0.1 Fox-1-1.6B Qwen1.5-1.8B-Chat Gemma-2B-It OpenELM-1.1B-Instruct
GSM8k 39.20% 36.39% 18.20% 4.47% 0.91%
MMLU 44.99% 43.05% 45.77% 37.70% 25.70%
ARC Challenge 43.60% 41.21% 38.99% 43.34% 40.36%
HellaSwag 63.39% 62.82% 60.31% 62.72% 71.67%
TruthfulQA 44.12% 38.66% 40.57% 45.86% 45.96%
Winogrande 62.67% 60.62% 59.51% 61.33% 61.96%
Average 49.66% 47.13% 43.89% 42.57% 41.09%
Downloads last month
219
Safetensors
Model size
1.67B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tensoropera/Fox-1-1.6B-Instruct-v0.1

Finetuned
(8)
this model
Quantizations
6 models

Space using tensoropera/Fox-1-1.6B-Instruct-v0.1 1