arxiv:2404.14219

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Published on Apr 22

· Submitted by

akhaliq on Apr 23

#1 Paper of the day

Upvote

243

Authors:

Marah Abdin ,

Sam Ade Jacobs ,

Ammar Ahmad Awan ,

Jyoti Aneja ,

Ahmed Awadallah ,

Nguyen Bach ,

Amit Bahree ,

Arash Bakhtiari ,

Harkirat Behl ,

Alon Benhaim ,

Johan Bjorck ,

Sébastien Bubeck ,

Martin Cai ,

Caio César Teodoro Mendes ,

Weizhu Chen ,

Vishrav Chaudhary ,

Parul Chopra ,

Allie Del Giorno ,

Gustavo de Rosa ,

Matthew Dixon

Abstract

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).

View arXiv page View PDF Add to collection

Community

mrfakename

Apr 23

Nice! When will the weights be released under an open source license?

Vezora

Apr 23

•

edited Apr 23

Also important could we also get the 3.3T token dataset? 🤗 pretty please

seven89

Apr 23

•

edited Apr 23

h2m

Apr 23

pocket-size gpt-3.5?
14B matching GPT-4-0314 MT-Bench?
<3

ZappY-AI

Apr 23

weights or it didn't happen. Also, please make it apache 2.0.

zolicsaki

Apr 23

Weights?

lunarflu

Apr 23

Very cool! Would be awesome to increase visibility + experimentation by sharing the weights as well 🤗

siberparsomen

Apr 23

This is ıncredible. When we will see the weights?

thomwolf

Apr 23

•

edited Apr 23

So great to see the successor of Phi-1.5/2 – Looking forward to being able to play with the model and embed it locally everywhere!

razvanab

Apr 23

Weights Please 🙏

Alex805

Apr 23

These SLMs are better and better so far. Would be cool to get an apk to actually run them on mobile devices without termux. Two existing things that I know of, are with limited models support. And GPT 3.5 level of model quality is a good occasion to wrap it

maveriq

Apr 23

I agree that SLMs probably need more focus and have potential to make great strides on multiple fronts; be it accessbility, deployability, inference speed and new usecases. Ofcourse it means putting in more effort on dataset curation and maybe even the architecture. Phi series is the proof that focused data curation alone can improve performance quite a bit.

ajibawa-2023

Apr 23

Given recent events, I don't think weights will be available and forget about dataset. Even if weights are released it will be taken down next day for some testing or alignment or some other stuff only never to return. Great job guys!!

ThreeBlessings

Apr 23

I'm not sure what recent events you're referring to. I'll wait for the official statement before jumping to conclusions.

MouhuAI

Apr 23

When will the model be released?

edmond

Apr 23

LLama was never competitive, LLama 2 got in a few weeks beaten by Mistral, LLama 3 got in a few days beaten by Phi 3 ?
Damn if this is true Zuck might start to become seriously mad ... (even if phi is using LLama 2)

Jaward

Apr 23

Here's a quick walkthrough of the paper: https://huggingface.co/posts/Jaward/284702584639894

mikelabs

Apr 23

I saw weights are coming tomorrow (on Twitter, hopefully it's legit!). In any case, there's a plain-english rewrite of this paper available here if you want: https://www.aimodels.fyi/papers/arxiv/phi-3-technical-report-highly-capable-language

lukestanley

Apr 23

Surely the first reference needs fixing to say 2024 and to use capital letters in the right places?
Currently says: "References
[AI23] Meta AI. Introducing meta llama 3: The most capable openly available llm to date, 2023."

Surely it should be: "[AI23] Meta AI. Introducing Meta Llama 3: The most capable openly available LLM to date, 2024."?
@gugarosa
So looking forward to playing with this, well done all!

deshwalmahesh

Apr 23

How did you "filter" data for Phase-1 and 2? Was it manual? How did you ensure if it was automated?

Also, what was the criteria for "inducing reasoning" on the dataset and web?

mishig

Apr 23

it is now available on hugging chat 🔥 https://huggingface.co/chat/models/microsoft/Phi-3-mini-4k-instruct

retteghy

Apr 23

•

edited Apr 24

for some reason the one on hugging chat gives me crappy answers , e.g.:
Q: what files are needed for a chrome extension, what are their names?
A: To add unit tests for the URLAnalyzer class, you'll need to set up a testing framework like Jest. Here's an example of how you might write tests for the waitForClassPresence and analyzeUrl met [...] and so on, completely unrelated junk)

I tried the gguf Q4 version in gpt4all, and got much better results, only issue is with the stop token

clem

Apr 23

The weights just dropped - MIT license!

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

KrishnaKaasyap

Apr 23

Would love to see all models in this family on LMSYS arena!

Arena is like double blind peer review ++ randomized controlled trials in science! The golden standard to judge something. I hope some API provider like Together API would provide inference services for these family of models to us all and also for Arena!

clem

Apr 23

•

edited Apr 23

you can try the models with transformers (https://github.com/huggingface/transformers) or TGI already ( https://github.com/huggingface/text-generation-inference) cc @Narsil @lysandre

Broomva

Apr 23

Does it have any tuning for function calling? What dataset was used or how to fine tune it for agent applications?

Abecid

Apr 23

Wow people are actually starting to use hf comments now really cool 😎

damojay

Apr 23

To be able to finetune it for json mode and the ability to use it in mobile will have very nice impact!
Opens so many opportunities for agents in GPU poor devices

librarian-bot

Apr 24

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Stable LM 2 1.6B Technical Report (2024)
Nemotron-4 15B Technical Report (2024)
Latxa: An Open Language Model and Evaluation Suite for Basque (2024)
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws (2024)
LAB: Large-Scale Alignment for ChatBots (2024)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

nbroad

Apr 26

phi-4 will run on a toaster

ajithprabhakar

Apr 28

Here is my blog showcasing this paper : https://ajithp.com/2024/04/28/the-miniature-language-model-with-massive-potential-introducing-phi-3/

m-ric

Apr 29

•

edited Apr 29

This very high performance on some benchmarks (the paper claims a performance than Mixtral 8x7B) seems suspicious, given that the model scores way lower on Chatbot Arena: it has an ELO 1064 as of now, so it's good but below Mistral 7B-Instruct-0.2 (1073), and far below Mixtral (1114).