File size: 2,623 Bytes
dc6e60e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b32cc71
dc6e60e
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
base_model: gordicaleksa/YugoGPT
inference: false
language:
  - sr
  - hr
license: apache-2.0
model_creator: gordicaleksa
model_name: YugoGPT
model_type: mistral
quantized_by: Luka Secerovic
---
[![sr](https://img.shields.io/badge/lang-sr-green.svg)](https://huggingface.co/alkibijad/YugoGPT-GGUF/blob/main/README.md)
[![en](https://img.shields.io/badge/lang-en-red.svg)](https://huggingface.co/alkibijad/YugoGPT-GGUF/blob/main/README.en.md)

# About the model
[YugoGPT](https://huggingface.co/gordicaleksa/YugoGPT) is currently the best open-source base 7B LLM for BCS (Bosnian, Croatian, Serbian).

This repository contains the model in [GGUF](https://github.com/ggerganov/llama.cpp/tree/master) format, which is very useful for local inference, and doesn't require expensive hardware.

# Versions
The model is compressed into a couple of smaller versions. Compression drops the quality slightly, but significantly increases the inference speed.

It's suggested to use the `Q4_1` version as it's the fastest one.


| Name | Size (GB) | Note                                                                   |
|-------|---------------|----------------------------------------------------------------------------|
| Q4_1  | 4.55          | Weights compressed to 4 bits. The fastest version.                       |
| q8_0  | 7.7           | Weights compressed to 8 bits.                                          |
| fp16  | 14.5          | Weights compressed to 16 bits.                                              |
| fp32  | 29            | Original, 32 bit weights. Not recommended to use this. |

# How to run this model locally?
## LMStudio - the easiest way ⚡️
Install [LMStudio](https://lmstudio.ai/).

- After installation, search for "alkibijad/YugoGPT":
![Pretraga](./media/lm_studio_screen_1.png "Pretraga modela")
- Choose a model version (recommended `Q4_1`):
![Izaberi model](./media/lm_studio_screen_2.1.png "Izaberi model")
- After the model finishes downloading, click on "chat" on the left side and start chatting.
- [Optional] You can setup a system prompt, e.g. "You're a helpful assistant" or however else you want.
![Chat](./media/lm_studio_screen_3.png "Chat")

That's it!

## llama.cpp - advanced 🤓
Ako si napredan korisnik i želiš da se petljaš sa komandnom linijom i naučiš više o `GGUF` formatu, idi na [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master) i pročitaj uputstva 🙂
If you're an advanced user and want to use CLI and learn more about `GGUF` format, go to [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master) and follow the instructions 🙂