Instructions to use nassimjp/Ministral-8B-Instruct-2410-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use nassimjp/Ministral-8B-Instruct-2410-4bit with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nassimjp/Ministral-8B-Instruct-2410-4bit to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nassimjp/Ministral-8B-Instruct-2410-4bit to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nassimjp/Ministral-8B-Instruct-2410-4bit to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="nassimjp/Ministral-8B-Instruct-2410-4bit", max_seq_length=2048, )
Ministral-8B-Instruct-2410-4bit (Optimized via Unsloth)
This repository contains a 4-bit quantized, text-only optimized version of Mistral AI's Ministral-8B-Instruct-2410. The model was quantized and patched locally using Unsloth to ensure minimal VRAM footprint and maximum training/inference efficiency on consumer-grade hardware (such as 16GB VRAM GPUs like the RTX 4070 Ti SUPER).
🦥 Key Features & Optimization
- Architecture: Pure Text-Causal LM (
MistralForCausalLM). Unlike multi-modal variants, this model is stripped of vision configurations to prevent VRAM overhead and text generation corruption (word salad issues). - Quantization: 4-bit NormalFloat (
nf4) viabitsandbytesembedded directly into the shards. - Memory Footprint: Down from
32GB to **5.75 GB**, making it fully compatible with 16GB GPU setups for both deep reasoning fine-tuning and inference. - Vocabulary Size: 131,072 tokens, offering excellent multi-lingual compression, particularly for non-Latin scripts like Pashto.
- Attention Mechanism: Features a mix of
full_attentionandsliding_attentionlayers (36 layers total), preserving deep contextual relationships over long inference steps.
🚀 Quick Start (Inference & Fine-Tuning)
To use this model seamlessly without triggering Hugging Face's weight reversion issues (NotImplementedError), load it directly using Unsloth's fast patching pipeline.
Prerequisites
Make sure you have unsloth, torch, and transformers installed in your environment:
pip install unsloth
1. Fast Inference Code
import torch
from unsloth import FastLanguageModel
max_seq_length = 4096
dtype = None # Auto-detects (bfloat16 for modern GPUs)
load_in_4bit = True
# Load optimized 4-bit model directly from this Hub repo
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "nassimjp/Ministral-8B-Instruct-2410-4bit",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
device_map = "auto"
)
FastLanguageModel.for_inference(model)
# Standard Chat Template Test
messages = [{"role": "user", "content": "سلام، په پښتو ژبه ووایه چې ته څوک یې؟"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
2. Fine-Tuning Prep (PEFT/LoRA Setup)
If you are setting up this model for downstream tasks (such as specialized Pashto reasoning/CoT data alignment), initialize your LoRA target modules like this:
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth", # Crucial for 16GB VRAM hardware safety
random_state = 3407,
use_rslora = False,
)
print("✅ Ready for V7 fine-tuning sequence.")
⚠️ Important Configuration Notes
- Padding Token: The base Mistral models do not have a default padding token. When loaded via Unsloth, it automatically assigns
pad_token = <pad>to ensure matrix mathematical safety during batched sequences. - Model Type: Hard-mapped to
model_type: "mistral". Avoid manual conversion to vision/conditional blocks to maintain stability.
📜 Acknowledgements & License
- Base Model: Developed by Mistral AI. Released under the
mistral-researchlicense. - Quantization Pipeline: Powered by Unsloth AI.
- Downloads last month
- 629
Model tree for nassimjp/Ministral-8B-Instruct-2410-4bit
Base model
mistralai/Ministral-8B-Instruct-2410