YugoGPT-GGUF / README.en.md
alkibijad's picture
typos
b32cc71
metadata
base_model: gordicaleksa/YugoGPT
inference: false
language:
  - sr
  - hr
license: apache-2.0
model_creator: gordicaleksa
model_name: YugoGPT
model_type: mistral
quantized_by: Luka Secerovic

sr en

About the model

YugoGPT is currently the best open-source base 7B LLM for BCS (Bosnian, Croatian, Serbian).

This repository contains the model in GGUF format, which is very useful for local inference, and doesn't require expensive hardware.

Versions

The model is compressed into a couple of smaller versions. Compression drops the quality slightly, but significantly increases the inference speed.

It's suggested to use the Q4_1 version as it's the fastest one.

Name Size (GB) Note
Q4_1 4.55 Weights compressed to 4 bits. The fastest version.
q8_0 7.7 Weights compressed to 8 bits.
fp16 14.5 Weights compressed to 16 bits.
fp32 29 Original, 32 bit weights. Not recommended to use this.

How to run this model locally?

LMStudio - the easiest way ⚡️

Install LMStudio.

  • After installation, search for "alkibijad/YugoGPT": Pretraga
  • Choose a model version (recommended Q4_1): Izaberi model
  • After the model finishes downloading, click on "chat" on the left side and start chatting.
  • [Optional] You can setup a system prompt, e.g. "You're a helpful assistant" or however else you want. Chat

That's it!

llama.cpp - advanced 🤓

Ako si napredan korisnik i želiš da se petljaš sa komandnom linijom i naučiš više o GGUF formatu, idi na llama.cpp i pročitaj uputstva 🙂 If you're an advanced user and want to use CLI and learn more about GGUF format, go to llama.cpp and follow the instructions 🙂