SentientAGI
/

Dobby-Mini-Unhinged-Llama-3.1-8B_GGUF

Model card Files Files and versions Community

Dobby-Mini-Unhinged-Llama-3.1-8B_GGUF / README.md

salzubi401's picture

Update README.md

17d5af0 verified 17 days ago

|

history blame contribute delete

2.79 kB

	---
	language:
	- en
	license: llama3.1
	library_name: transformers
	tags:
	- Llama-3.1
	- Instruct
	- loyal AI
	- GGUF
	- finetune
	- chat
	- gpt4
	- synthetic data
	- roleplaying
	- unhinged
	- funny
	- opinionated
	- assistant
	- companion
	- friend
	base_model: meta-llama/Llama-3.1-8B-Instruct
	---

	# Dobby-Mini-Unhinged-Llama-3.1-8B_GGUF

	Dobby-Mini-Unhinged is a compact, high-performance GGUF model based on Llama 3.1 with 8 billion parameters. Designed for efficiency, this model supports quantization levels in 4-bit, 6-bit, and 8-bit, offering flexibility to run on various hardware configurations without compromising performance.

	## Compatibility

	This model is compatible with:

	- [LMStudio](https://lmstudio.ai/): An easy-to-use desktop application for running and fine-tuning large language models locally.
	- [Ollama](https://ollama.com/): A versatile tool for deploying, managing, and interacting with large language models seamlessly.

	## Quantization Levels

	\| Quantization \| Description \| Use Case \|
	\|------------------\|------------------------------------------------------------------------------------------------------------------------------------------------------\|----------------------------------------------------------------------------------------------------------\|
	\| 4-bit \| Highly compressed for minimal memory usage. Some loss in precision and quality, but great for lightweight devices with limited VRAM. \| Ideal for testing, quick prototyping, or running on low-end GPUs and CPUs. \|
	\| 6-bit \| Strikes a balance between compression and quality. Offers improved accuracy over 4-bit without requiring significant additional resources. \| Recommended for users with mid-range hardware aiming for a compromise between speed and precision. \|
	\| 8-bit \| Full-precision quantization for maximum quality while still optimizing memory usage compared to full FP16 or FP32 models. \| Perfect for high-performance systems where maintaining accuracy and precision is critical. \|

	## Recommended Usage

	Choose your quantization level based on the hardware you are using:
	- 4-bit for ultra-lightweight systems.
	- 6-bit for balance on mid-tier hardware.
	- 8-bit for maximum performance on powerful GPUs.

	This model supports prompt fine-tuning for domain-specific tasks, making it an excellent choice for interactive applications like chatbots, question answering, and creative writing.