Ursa_Minor-GGUF

Official GGUF quantizations of Sculptor-AI/Ursa_Minor.

About

This repository contains the official quantized versions of Ursa_Minor, created by Sculptor-AI. These quantizations are optimized for both quality and performance.

Model Description

Ursa_Minor is a reasoning-focused language model developed by ExplodingCB2 & Kaileh57 (Sculptor-AI). It is designed to tackle complex reasoning tasks, demonstrating capabilities in multi-step inference, logical deduction, and contextual understanding.

Key Features:

Reasoning Prowess: Emphasizes strong reasoning abilities over sheer memorization
Multi-Step Inference: Breaks down complex problems into smaller, manageable steps
Logical Deduction: Applies logical rules and principles to arrive at valid conclusions
Contextual Understanding: Grasps and utilizes contextual information to enhance reasoning accuracy

Usage

If you are unsure how to use GGUF files, here are some common ways to run the model:

llama.cpp

./main -m /path/to/Ursa_Minor.Q4_K_M.gguf -n 512 -p "What are the prime factors of 42?"

Text-generation-webui

Load the model in text-generation-webui by selecting the GGUF file from your models directory.

LM Studio

Import the GGUF file directly in LM Studio to run the model locally.

Provided Quantizations

The following quantizations are available, sorted from smallest to largest file size:

Type	Size	Quality	Inference Speed	Notes
Q2_K	0.8 GB	Basic	Very Fast	Smallest size, acceptable for basic tasks
Q3_K_S	0.9 GB	Improved	Fast	Good balance for limited resources
Q3_K_M	0.9 GB	Improved+	Fast	Slightly better than Q3_K_S
Q3_K_L	1.0 GB	Good	Moderate	Recommended for good quality with small size
IQ4_XS	1.0 GB	Good+	Moderate	Improved quantization technique
Q4_K_S	1.0 GB	Very Good	Fast	Recommended for most users
Q4_K_M	1.1 GB	Very Good+	Fast	Recommended for daily use
Q5_K_S	1.2 GB	Excellent	Moderate	High-quality output
Q5_K_M	1.2 GB	Excellent+	Moderate	Enhanced quality over Q5_K_S
Q6_K	1.4 GB	Superior	Moderate	Very high-quality, close to F16
Q8_0	1.7 GB	Near-perfect	Slow	Almost indistinguishable from F16
F16	3.2 GB	Perfect	Very Slow	No quantization, full precision

Recommendations

For most users: Q4_K_M provides an excellent balance of quality and size
For limited resources: Q3_K_L or Q3_K_S offer good performance at smaller sizes
For best quality: Q6_K or Q8_0 provide near-original model quality

Quantization Quality Comparison

Here's a comparison of different quantization types (lower is better):

For more detailed information about quantization techniques and their effects, see Artefact2's analysis.

Community and Support

If you have questions or need support with these quantized models, please open a discussion on our community page.

Acknowledgments

We thank the community for their support and feedback in helping us improve and optimize these model quantizations.

Sculptor-AI
/

Ursa_Minor-GGUF