|
--- |
|
language: |
|
- en |
|
license: llama3.1 |
|
library_name: transformers |
|
tags: |
|
- Llama-3.1 |
|
- Instruct |
|
- loyal AI |
|
- GGUF |
|
- finetune |
|
- chat |
|
- gpt4 |
|
- synthetic data |
|
- roleplaying |
|
- unhinged |
|
- funny |
|
- opinionated |
|
- assistant |
|
- companion |
|
- friend |
|
base_model: meta-llama/Llama-3.1-8B-Instruct |
|
--- |
|
|
|
# Dobby-Mini-Unhinged-Llama-3.1-8B_GGUF |
|
|
|
Dobby-Mini-Unhinged is a compact, high-performance GGUF model based on Llama 3.1 with 8 billion parameters. Designed for efficiency, this model supports quantization levels in **4-bit**, **6-bit**, and **8-bit**, offering flexibility to run on various hardware configurations without compromising performance. |
|
|
|
## Compatibility |
|
|
|
This model is compatible with: |
|
|
|
- **[LMStudio](https://lmstudio.ai/)**: An easy-to-use desktop application for running and fine-tuning large language models locally. |
|
- **[Ollama](https://ollama.com/)**: A versatile tool for deploying, managing, and interacting with large language models seamlessly. |
|
|
|
## Quantization Levels |
|
|
|
| **Quantization** | **Description** | **Use Case** | |
|
|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| |
|
| **4-bit** | Highly compressed for minimal memory usage. Some loss in precision and quality, but great for lightweight devices with limited VRAM. | Ideal for testing, quick prototyping, or running on low-end GPUs and CPUs. | |
|
| **6-bit** | Strikes a balance between compression and quality. Offers improved accuracy over 4-bit without requiring significant additional resources. | Recommended for users with mid-range hardware aiming for a compromise between speed and precision. | |
|
| **8-bit** | Full-precision quantization for maximum quality while still optimizing memory usage compared to full FP16 or FP32 models. | Perfect for high-performance systems where maintaining accuracy and precision is critical. | |
|
|
|
## Recommended Usage |
|
|
|
Choose your quantization level based on the hardware you are using: |
|
- **4-bit** for ultra-lightweight systems. |
|
- **6-bit** for balance on mid-tier hardware. |
|
- **8-bit** for maximum performance on powerful GPUs. |
|
|
|
This model supports prompt fine-tuning for domain-specific tasks, making it an excellent choice for interactive applications like chatbots, question answering, and creative writing. |
|
|