How to Get Started with the Model

πŸš€ How to Use This Model for Inference

This model is fine-tuned using LoRA (PEFT) on Phi-4 (4-bit Unsloth). To use it, you need to:

  1. Load the base model
  2. Load the LoRA adapter
  3. Run inference

πŸ“Œ Install Required Libraries

Before running the code, make sure you have the necessary dependencies installed:

pip install unsloth peft transformers torch

πŸ“ Load and Run Inference


from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# Load the base model
base_model_name = "unsloth/Phi-4-unsloth-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=4096,  # Must match fine-tuning
    load_in_4bit=True,
)

# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/Phi_Fullshot"
model = PeftModel.from_pretrained(model, lora_model_name)

# Run inference
input_text = "Why do we need to go to see something?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4)

# Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)


πŸ’‘ Notes

  • This model is quantized in 4-bit for efficiency.
  • Ensure max_seq_length matches the training configuration.
  • This model requires a GPU (CUDA) for inference.

[More Information Needed]

Uploaded model

  • Developed by: Machlovi
  • License: apache-2.0
  • Finetuned from model : unsloth/Phi-4-unsloth-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.