--- language: - sr license: mit tags: - text-generation-inference - transformers - mistral base_model: datatab/Yugo55-GPT-v4 datasets: - datatab/alpaca-cleaned-serbian-full - datatab/open-orca-slim-serbian --- # Yugo55-GPT-v4-4bit - **Developed by:** datatab - **License:** mit - **Quantized from model :** datatab/Yugo55-GPT-v4 ## 🧩 Configuration ```yaml models: - model: datatab/Serbian-Mistral-Orca-Slim-v1 parameters: weight: 1.0 - model: mlabonne/AlphaMonarch-7B parameters: weight: 1.0 - model: datatab/YugoGPT-Alpaca-v1-epoch1-good parameters: weight: 1.0 merge_method: linear dtype: float16 ``` ## 🏆 Results > Results obtained through the Serbian LLM evaluation, released by Aleksa Gordić: [serbian-llm-eval](https://github.com/gordicaleksa/serbian-llm-eval) > * Evaluation was conducted on a 4-bit version of the model due to hardware resource constraints.
MODEL ARC-E ARC-C Hellaswag BoolQ Winogrande OpenbookQA PiQA
*Yugo55-GPT-v4-4bit 51.41 36.00 57.51 80.92 65.75 34.70 70.54
Yugo55A-GPT 51.52 37.78 57.52 84.40 65.43 35.60 69.43
## 💻 Usage ```terminal !pip -q install git+https://github.com/huggingface/transformers # need to install from github !pip install -q datasets loralib sentencepiece !pip -q install bitsandbytes accelerate ``` ```python from IPython.display import HTML, display def set_css(): display(HTML(''' ''')) get_ipython().events.register('pre_run_cell', set_css) ``` ```python import torch import transformers from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "datatab/Yugo55-GPT-v4-4bit", torch_dtype="auto" ) tokenizer = AutoTokenizer.from_pretrained( "datatab/Yugo55-GPT-v4-4bit", torch_dtype="auto" ) ``` ```python from transformers import TextStreamer def generate(question="", input="Odgovaraj uvek na Srpskom jeziku!!!"): alpaca_prompt = """Ispod je uputstvo koje opisuje zadatak, upareno sa unosom koji pruža dodatni kontekst. Napišite odgovor koji na odgovarajući način kompletira zahtev. ### Uputstvo: {} ### Unos: {} ### Odgovor: {}""" inputs = tokenizer( [ alpaca_prompt.format( question, # instruction input, # input "", # output - leave this blank for generation! ) ], return_tensors="pt", ).to("cuda") text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) _ = model.generate( **inputs, streamer=text_streamer, max_new_tokens=1024, temperature=0.1, repetition_penalty=1.11, top_p=0.92, top_k=1, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, do_sample=True, use_cache=True ) ``` ```python generate("Nabroj mi sve planete suncevog sistemai reci mi koja je najveca planeta") ``` ```python generate("Koja je razlika između lame, vikune i alpake?") ``` ```python generate("Napišite kratku e-poruku Semu Altmanu dajući razloge za GPT-4 otvorenog koda") ```