base_model: microsoft/Phi-3-mini-4k-instruct
inference: false
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE
language:
- en
pipeline_tag: text-generation
tags:
- nlp
- code
model_creator: microsoft
model_name: Phi-3-mini-4k-instruct
model_type: phi3
quantized_by: brittlewis12
Phi 3 Mini 4K Instruct GGUF
Updated with Microsoft’s latest model changes as of July 21, 2024
Original model: Phi-3-mini-4k-instruct
Model creator: Microsoft
This repo contains GGUF format model files for Microsoft’s Phi 3 Mini 4K Instruct.
The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
Learn more on Microsoft’s Model page.
What is GGUF?
GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 3432 (revision 45f2c19), using autogguf.
Prompt template
<|system|>
{{system_prompt}}<|end|>
<|user|>
{{prompt}}<|end|>
<|assistant|>
Download & run with cnvrs on iPhone, iPad, and Mac!
cnvrs is the best app for private, local AI on your device:
- create & save Characters with custom system prompts & temperature settings
- download and experiment with any GGUF model you can find on HuggingFace!
- make it your own with custom Theme colors
- powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
- try it out yourself today, on Testflight!
- follow cnvrs on twitter to stay up to date
Original Model Evaluation
Comparison of July update vs original April release:
Benchmarks | Original | June 2024 Update |
---|---|---|
Instruction Extra Hard | 5.7 | 6.0 |
Instruction Hard | 4.9 | 5.1 |
Instructions Challenge | 24.6 | 42.3 |
JSON Structure Output | 11.5 | 52.3 |
XML Structure Output | 14.4 | 49.8 |
GPQA | 23.7 | 30.6 |
MMLU | 68.8 | 70.9 |
Average | 21.9 | 36.7 |
Original April release
As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model.
The number of k–shot examples is listed per-benchmark.
Phi-3-Mini-4K-In 3.8b |
Phi-2 2.7b |
Mistral 7b |
Gemma 7b |
Llama-3-In 8b |
Mixtral 8x7b |
GPT-3.5 version 1106 |
|
---|---|---|---|---|---|---|---|
MMLU 5-Shot |
68.8 | 56.3 | 61.7 | 63.6 | 66.5 | 68.4 | 71.4 |
HellaSwag 5-Shot |
76.7 | 53.6 | 58.5 | 49.8 | 71.1 | 70.4 | 78.8 |
ANLI 7-Shot |
52.8 | 42.5 | 47.1 | 48.7 | 57.3 | 55.2 | 58.1 |
GSM-8K 0-Shot; CoT |
82.5 | 61.1 | 46.4 | 59.8 | 77.4 | 64.7 | 78.1 |
MedQA 2-Shot |
53.8 | 40.9 | 49.6 | 50.0 | 60.5 | 62.2 | 63.4 |
AGIEval 0-Shot |
37.5 | 29.8 | 35.1 | 42.1 | 42.0 | 45.2 | 48.4 |
TriviaQA 5-Shot |
64.0 | 45.2 | 72.3 | 75.2 | 67.7 | 82.2 | 85.8 |
Arc-C 10-Shot |
84.9 | 75.9 | 78.6 | 78.3 | 82.8 | 87.3 | 87.4 |
Arc-E 10-Shot |
94.6 | 88.5 | 90.6 | 91.4 | 93.4 | 95.6 | 96.3 |
PIQA 5-Shot |
84.2 | 60.2 | 77.7 | 78.1 | 75.7 | 86.0 | 86.6 |
SociQA 5-Shot |
76.6 | 68.3 | 74.6 | 65.5 | 73.9 | 75.9 | 68.3 |
BigBench-Hard 0-Shot |
71.7 | 59.4 | 57.3 | 59.6 | 51.5 | 69.7 | 68.32 |
WinoGrande 5-Shot |
70.8 | 54.7 | 54.2 | 55.6 | 65 | 62.0 | 68.8 |
OpenBookQA 10-Shot |
83.2 | 73.6 | 79.8 | 78.6 | 82.6 | 85.8 | 86.0 |
BoolQ 0-Shot |
77.6 | -- | 72.2 | 66.0 | 80.9 | 77.6 | 79.1 |
CommonSenseQA 10-Shot |
80.2 | 69.3 | 72.6 | 76.2 | 79 | 78.1 | 79.6 |
TruthfulQA 10-Shot |
65.0 | -- | 52.1 | 53.0 | 63.2 | 60.1 | 85.8 |
HumanEval 0-Shot |
59.1 | 47.0 | 28.0 | 34.1 | 60.4 | 37.8 | 62.2 |
MBPP 3-Shot |
53.8 | 60.6 | 50.8 | 51.5 | 67.7 | 60.2 | 77.8 |