alkibijad
/

YugoGPT-GGUF

text-generation-inference

Model card Files Files and versions Community

YugoGPT-GGUF / README.en.md

alkibijad's picture

typos

b32cc71 3 months ago

|

raw history blame contribute delete

No virus

2.62 kB

	---
	base_model: gordicaleksa/YugoGPT
	inference: false
	language:
	- sr
	- hr
	license: apache-2.0
	model_creator: gordicaleksa
	model_name: YugoGPT
	model_type: mistral
	quantized_by: Luka Secerovic
	---
	[![sr](https://img.shields.io/badge/lang-sr-green.svg)](https://huggingface.co/alkibijad/YugoGPT-GGUF/blob/main/README.md)
	[![en](https://img.shields.io/badge/lang-en-red.svg)](https://huggingface.co/alkibijad/YugoGPT-GGUF/blob/main/README.en.md)

	# About the model
	[YugoGPT](https://huggingface.co/gordicaleksa/YugoGPT) is currently the best open-source base 7B LLM for BCS (Bosnian, Croatian, Serbian).

	This repository contains the model in [GGUF](https://github.com/ggerganov/llama.cpp/tree/master) format, which is very useful for local inference, and doesn't require expensive hardware.

	# Versions
	The model is compressed into a couple of smaller versions. Compression drops the quality slightly, but significantly increases the inference speed.

	It's suggested to use the `Q4_1` version as it's the fastest one.


	\| Name \| Size (GB) \| Note \|
	\|-------\|---------------\|----------------------------------------------------------------------------\|
	\| Q4_1 \| 4.55 \| Weights compressed to 4 bits. The fastest version. \|
	\| q8_0 \| 7.7 \| Weights compressed to 8 bits. \|
	\| fp16 \| 14.5 \| Weights compressed to 16 bits. \|
	\| fp32 \| 29 \| Original, 32 bit weights. Not recommended to use this. \|

	# How to run this model locally?
	## LMStudio - the easiest way ⚡️
	Install [LMStudio](https://lmstudio.ai/).

	- After installation, search for "alkibijad/YugoGPT":
	![Pretraga](./media/lm_studio_screen_1.png "Pretraga modela")
	- Choose a model version (recommended `Q4_1`):
	![Izaberi model](./media/lm_studio_screen_2.1.png "Izaberi model")
	- After the model finishes downloading, click on "chat" on the left side and start chatting.
	- [Optional] You can setup a system prompt, e.g. "You're a helpful assistant" or however else you want.
	![Chat](./media/lm_studio_screen_3.png "Chat")

	That's it!

	## llama.cpp - advanced 🤓
	Ako si napredan korisnik i želiš da se petljaš sa komandnom linijom i naučiš više o `GGUF` formatu, idi na [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master) i pročitaj uputstva 🙂
	If you're an advanced user and want to use CLI and learn more about `GGUF` format, go to [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master) and follow the instructions 🙂