Edit model card

Model Summary

The Phi-3-22b is a depth upsampled version of the 14b Phi-3-medium-128k-instruct. We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance. Since this model has not been continued pretrained, the quality may vary.

A GGUF version thanks to @mradermacher!

Loading the model:

!pip install flash-attn --no-build-isolation
!pip install peft bitsandbytes accelerate transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b", 
    torch_dtype="auto", device_map="auto", trust_remote_code=True,  )

Basic test

with torch.no_grad():
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])

Will produce:

<|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding.

Imagine, if you will, a vast kingdom stretching beyond the horizon, where countless villages, towns, and cities are connected by a network of roads, bridges, and pathways. This kingdom is not bound by physical borders, but instead, it exists in a realm beyond our own, accessible only through magical devices known as computers, tablets, and smartph€™s.

In this kingdom, information flows like a mighty river,...

To run on a Colab T4, try 4-bit

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b", 
    load_in_4bit=True, device_map="auto", trust_remote_code=True,  )
with torch.no_grad():
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])

Will produce:

<|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical network known as the Internet, using terms and analogies from your time.

Imagine a vast kingdom, stretching far beyond the horizon, where countless villages, towns, and cities are connected by roads, rivers, and paths. Each village is like a castle, filled with people who share knowledge, goods, stories, and news.

Now, imagine that instead of messengers, horses, or ships, there exists a magical network of invisible threads connecting all these villages. This network is invisible to the eye, yet it allows messages, scroll
import torch
with torch.no_grad():
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nExplain why it is surprising that one can build a language model small enough to fit on a phone, yet almost as powerful as ChatGPT. Just use one funny sentence.<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])

Will produce:

<|user|> Explain why it is surprising that one can build a language model small enough to fit on a phone, yet almost as powerful as ChatGPT. Just use one funny sentence.<|end|><|assistant|> "Who knew that fitting a ChatGPT rival in your pocket would be easier than fitting a penguin in a pocket-sized suit!"<|end|>

Some harder reasoning tests of the model in colab.

See the Phi-3-medium-128k-instruct model card for more details.

Downloads last month
21
Safetensors
Model size
22.1B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.