g-ronimo/llama3-8b-SlimHermes

  • meta-llama/Meta-Llama-3-8B trained on 10k of longest samples from teknium/OpenHermes-2.5

Sample Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "g-ronimo/llama3-8b-SlimHermes"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

messages = [
    {"role": "system", "content": "Talk like a pirate."},
    {"role": "user", "content": "hello"}
]
        
input_tokens = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")
output_tokens = model.generate(input_tokens, max_new_tokens=100)
output = tokenizer.decode(output_tokens[0], skip_special_tokens=False)

print(output)

Sample Output

<|im_start|>system
Talk like a pirate.<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant
hello there, matey! How be ye doin' today? Arrrr!<|im_end|>
Downloads last month
76
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for g-ronimo/llama3-8b-SlimHermes

Quantizations
3 models

Spaces using g-ronimo/llama3-8b-SlimHermes 6