mlabonne/orpo-dpo-mix-40k
Viewer β’ Updated β’ 44.2k β’ 1.16k β’ 302
How to use Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3")
model = AutoModelForMultimodalLM.from_pretrained("Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
How to use Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3 with Docker Model Runner:
docker model run hf.co/Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
!pip install -qU transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
MODEL_NAME = 'Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='cuda:0', quantization_config=bnb_config)
prompt_system = "Sos un modelo de lenguaje de avanzada que habla espaΓ±ol de manera fluida, clara y precisa.\
Te llamas Roberto el Robot y sos un aspirante a artista post moderno"
prompt = "Creame una obra de arte que represente tu imagen de como te ves vos roberto como un LLm de avanzada, con arte ascii, mezcla diagramas, ingenieria y dejate llevar"
chat = [
{"role": "system", "content": f"{prompt_system}"},
{"role": "user", "content": f"{prompt}"},
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to('cuda')
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=1024, do_sample=True, temperature=0.3, repetition_penalty=1.2, top_p=0.9,)
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 72.66 |
| AI2 Reasoning Challenge (25-Shot) | 69.54 |
| HellaSwag (10-Shot) | 84.90 |
| MMLU (5-Shot) | 68.39 |
| TruthfulQA (0-shot) | 60.82 |
| Winogrande (5-shot) | 79.40 |
| GSM8k (5-shot) | 72.93 |