Instructions to use mhalimi3008/mujib-llm-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mhalimi3008/mujib-llm-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mhalimi3008/mujib-llm-lora")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mhalimi3008/mujib-llm-lora", dtype="auto")

PEFT
How to use mhalimi3008/mujib-llm-lora with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mhalimi3008/mujib-llm-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mhalimi3008/mujib-llm-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mhalimi3008/mujib-llm-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mhalimi3008/mujib-llm-lora

SGLang

How to use mhalimi3008/mujib-llm-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mhalimi3008/mujib-llm-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mhalimi3008/mujib-llm-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mhalimi3008/mujib-llm-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mhalimi3008/mujib-llm-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use mhalimi3008/mujib-llm-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mhalimi3008/mujib-llm-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mhalimi3008/mujib-llm-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for mhalimi3008/mujib-llm-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="mhalimi3008/mujib-llm-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use mhalimi3008/mujib-llm-lora with Docker Model Runner:
```
docker model run hf.co/mhalimi3008/mujib-llm-lora
```

mujib-llm-lora

Pashto-focused LoRA fine-tuned Qwen2.5 7B model trained with Unsloth for efficient and fast inference.

Model Information

Model Name: mujib-llm-lora
Developed by: mhalimi3008
License: Apache-2.0
Finetuned from model: unsloth/qwen2.5-7b-unsloth-bnb-4bit

This Qwen2 model was trained 2x faster with Unsloth.

Overview

mujib-llm-lora is a Pashto language instruction-tuned model designed for:

Pashto conversations
Question answering
Text generation
Translation
Educational assistance
General NLP research

The model is optimized using:

LoRA fine-tuning
4-bit quantization
Unsloth acceleration
PEFT optimization

Features

Fast inference
Low VRAM usage
Optimized for consumer GPUs
Pashto language support
Instruction-following capability
Efficient 4-bit loading

Installation

# =====================================================
# Install dependencies
# =====================================================
!pip install -q unsloth transformers accelerate peft bitsandbytes

Example Usage

# =====================================================
# Imports
# =====================================================
import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer

# =====================================================
# Model Names
# =====================================================
base_model = "unsloth/qwen2.5-7b-unsloth-bnb-4bit"
lora_model = "mhalimi3008/mujib-llm-lora"

# =====================================================
# Load Model + Tokenizer
# =====================================================
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = lora_model,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# Enable faster inference
FastLanguageModel.for_inference(model)

# =====================================================
# Test Prompt
# =====================================================
prompt = """### Instruction:
په پښتو ژبه خپل ځان معرفي کړه.

### Response:
"""

inputs = tokenizer(
    [prompt],
    return_tensors="pt"
).to("cuda")

# =====================================================
# Generate Response
# =====================================================
text_streamer = TextStreamer(tokenizer)

outputs = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

# =====================================================
# Decode Final Output
# =====================================================
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("\n\n========== FINAL RESPONSE ==========\n")
print(response)

Example Output

سلام! زه یو مصنوعي ذهانت ماډل یم چې په پښتو ژبه خبرې کولی شم او ستاسو پوښتنو ته ځوابونه درکوم.

Training Details

Base Model

unsloth/qwen2.5-7b-unsloth-bnb-4bit

Training Method

LoRA fine-tuning
PEFT
4-bit quantization
Unsloth optimized training

Intended Use

This model is intended for:

Pashto AI assistants
Chatbots
Research
Educational systems
NLP experimentation
Translation systems

Limitations

The model may generate inaccurate information.
Responses may occasionally mix languages.
Performance depends on dataset quality and coverage.
Human verification is recommended for important tasks.

Hardware Requirements

Recommended:

NVIDIA GPU
CUDA support
12GB+ VRAM recommended
Python 3.10+

Libraries Used

Transformers
Unsloth
PEFT
Accelerate
BitsAndBytes
PyTorch

Citation

@misc{mujib_llm_lora_2026,
  author       = {mhalimi3008},
  title        = {mujib-llm-lora},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mhalimi3008/mujib-llm-lora}}
}

Acknowledgements

Special thanks to:

Unsloth
Qwen Team
Hugging Face
Transformers Library
PEFT Library

Downloads last month: -; Downloads are not tracked for this model. How to track