Edit model card

MixtureOfPhi3

MixtureOfPhi3 is a Mixure of Experts (MoE) made with the following models using mergekit:

This has been created using LazyMergekit-Phi3

This run is only for development purposes, since merging 2 identical models does not bring any performance benefits, but once specialized finetunes of Phi3 models will be available, it will be a starting point for creating MoE from them.

©️ Credits

These have been merged using cheap_embed where each model is assigned a vector representation of words - such as experts for scientific work, reasoning, math etc.

Try your own in the link above !

🧩 Configuration

base_model: microsoft/Phi-3-mini-128k-instruct
gate_mode: cheap_embed
dtype: float16
experts:
  - source_model: microsoft/Phi-3-mini-128k-instruct
    positive_prompts: ["research, logic, math, science"]
  - source_model: microsoft/Phi-3-mini-128k-instruct
    positive_prompts: ["creative, art"]

💻 Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model = "paulilioaica/MixtureOfPhi3"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(
    model, 
    trust_remote_code=True, 
)

prompt="How many continents are there?"
input = f"<|system|>\nYou are a helpful AI assistant.<|end|>\n<|user|>{prompt}\n<|assistant|>"
tokenized_input = tokenizer.encode(input, return_tensors="pt")

outputs = model.generate(tokenized_input, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(tokenizer.decode(outputs[0]))
Downloads last month
12
Safetensors
Model size
6.24B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of