How to Get Started with the Model

🚀 How to Use This Model for Inference

This model is fine-tuned using LoRA (PEFT) on Phi-4 (4-bit Unsloth). To use it, you need to:

Load the base model
Load the LoRA adapter
Run inference

📌 Install Required Libraries

Before running the code, make sure you have the necessary dependencies installed:

pip install unsloth peft transformers torch

📝 Load and Run Inference


from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# Load the base model
base_model_name = "unsloth/Phi-4-unsloth-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=4096,  # Must match fine-tuning
    load_in_4bit=True,
)

# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/Phi_Fullshot"
model = PeftModel.from_pretrained(model, lora_model_name)

# Run inference
input_text = "Why do we need to go to see something?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4)

# Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

💡 Notes

This model is quantized in 4-bit for efficiency.
Ensure max_seq_length matches the training configuration.
This model requires a GPU (CUDA) for inference.

[More Information Needed]

Uploaded model

Developed by: Machlovi
License: apache-2.0
Finetuned from model : unsloth/Phi-4-unsloth-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.