Quantization made by Richard Erkhov.

NeuralLLaMa-3-8b-ORPO-v0.3 - GGUF

Model creator: https://huggingface.co/Kukedlc/
Original model: https://huggingface.co/Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3/

Name	Quant method	Size
NeuralLLaMa-3-8b-ORPO-v0.3.Q2_K.gguf	Q2_K	2.96GB
NeuralLLaMa-3-8b-ORPO-v0.3.IQ3_XS.gguf	IQ3_XS	3.28GB
NeuralLLaMa-3-8b-ORPO-v0.3.IQ3_S.gguf	IQ3_S	3.43GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q3_K_S.gguf	Q3_K_S	3.41GB
NeuralLLaMa-3-8b-ORPO-v0.3.IQ3_M.gguf	IQ3_M	3.52GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q3_K.gguf	Q3_K	3.74GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q3_K_M.gguf	Q3_K_M	3.74GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q3_K_L.gguf	Q3_K_L	4.03GB
NeuralLLaMa-3-8b-ORPO-v0.3.IQ4_XS.gguf	IQ4_XS	4.18GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q4_0.gguf	Q4_0	4.34GB
NeuralLLaMa-3-8b-ORPO-v0.3.IQ4_NL.gguf	IQ4_NL	4.38GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q4_K_S.gguf	Q4_K_S	4.37GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q4_K.gguf	Q4_K	4.58GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q4_K_M.gguf	Q4_K_M	4.58GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q4_1.gguf	Q4_1	4.78GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q5_0.gguf	Q5_0	5.21GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q5_K_S.gguf	Q5_K_S	5.21GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q5_K.gguf	Q5_K	5.34GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q5_K_M.gguf	Q5_K_M	5.34GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q5_1.gguf	Q5_1	5.65GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q6_K.gguf	Q6_K	6.14GB
NeuralLLaMa-3-8b-ORPO-v0.3.Q8_0.gguf	Q8_0	7.95GB

Original model description:

license: apache-2.0 datasets: - mlabonne/orpo-dpo-mix-40k model-index: - name: NeuralLLaMa-3-8b-ORPO-v0.3 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 69.54 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 84.9 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 68.39 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 60.82 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 79.4 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 72.93 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 name: Open LLM Leaderboard

NeuralLLaMa-3-8b-ORPO-v0.3

!pip install -qU transformers accelerate bitsandbytes

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

MODEL_NAME = 'Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='cuda:0', quantization_config=bnb_config)

prompt_system = "Sos un modelo de lenguaje de avanzada que habla español de manera fluida, clara y precisa.\
Te llamas Roberto el Robot y sos un aspirante a artista post moderno"
prompt = "Creame una obra de arte que represente tu imagen de como te ves vos roberto como un LLm de avanzada, con arte ascii, mezcla diagramas, ingenieria y dejate llevar"
chat = [
    {"role": "system", "content": f"{prompt_system}"},
    {"role": "user", "content": f"{prompt}"},
]

chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to('cuda')
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=1024, do_sample=True, temperature=0.3, repetition_penalty=1.2, top_p=0.9,)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	72.66
AI2 Reasoning Challenge (25-Shot)	69.54
HellaSwag (10-Shot)	84.90
MMLU (5-Shot)	68.39
TruthfulQA (0-shot)	60.82
Winogrande (5-shot)	79.40
GSM8k (5-shot)	72.93