Ursa_Minor-GGUF
Official GGUF quantizations of Sculptor-AI/Ursa_Minor.
About
This repository contains the official quantized versions of Ursa_Minor, created by Sculptor-AI. These quantizations are optimized for both quality and performance.
Model Description
Ursa_Minor is a reasoning-focused language model developed by ExplodingCB2 & Kaileh57 (Sculptor-AI). It is designed to tackle complex reasoning tasks, demonstrating capabilities in multi-step inference, logical deduction, and contextual understanding.
Key Features:
- Reasoning Prowess: Emphasizes strong reasoning abilities over sheer memorization
- Multi-Step Inference: Breaks down complex problems into smaller, manageable steps
- Logical Deduction: Applies logical rules and principles to arrive at valid conclusions
- Contextual Understanding: Grasps and utilizes contextual information to enhance reasoning accuracy
Usage
If you are unsure how to use GGUF files, here are some common ways to run the model:
llama.cpp
./main -m /path/to/Ursa_Minor.Q4_K_M.gguf -n 512 -p "What are the prime factors of 42?"
Text-generation-webui
Load the model in text-generation-webui by selecting the GGUF file from your models directory.
LM Studio
Import the GGUF file directly in LM Studio to run the model locally.
Provided Quantizations
The following quantizations are available, sorted from smallest to largest file size:
Type | Size | Quality | Inference Speed | Notes |
---|---|---|---|---|
Q2_K | 0.8 GB | Basic | Very Fast | Smallest size, acceptable for basic tasks |
Q3_K_S | 0.9 GB | Improved | Fast | Good balance for limited resources |
Q3_K_M | 0.9 GB | Improved+ | Fast | Slightly better than Q3_K_S |
Q3_K_L | 1.0 GB | Good | Moderate | Recommended for good quality with small size |
IQ4_XS | 1.0 GB | Good+ | Moderate | Improved quantization technique |
Q4_K_S | 1.0 GB | Very Good | Fast | Recommended for most users |
Q4_K_M | 1.1 GB | Very Good+ | Fast | Recommended for daily use |
Q5_K_S | 1.2 GB | Excellent | Moderate | High-quality output |
Q5_K_M | 1.2 GB | Excellent+ | Moderate | Enhanced quality over Q5_K_S |
Q6_K | 1.4 GB | Superior | Moderate | Very high-quality, close to F16 |
Q8_0 | 1.7 GB | Near-perfect | Slow | Almost indistinguishable from F16 |
F16 | 3.2 GB | Perfect | Very Slow | No quantization, full precision |
Recommendations
- For most users: Q4_K_M provides an excellent balance of quality and size
- For limited resources: Q3_K_L or Q3_K_S offer good performance at smaller sizes
- For best quality: Q6_K or Q8_0 provide near-original model quality
Quantization Quality Comparison
Here's a comparison of different quantization types (lower is better):
For more detailed information about quantization techniques and their effects, see Artefact2's analysis.
Community and Support
If you have questions or need support with these quantized models, please open a discussion on our community page.
Acknowledgments
We thank the community for their support and feedback in helping us improve and optimize these model quantizations.
Model tree for Sculptor-AI/Ursa_Minor-GGUF
Base model
Sculptor-AI/Ursa_Minor