Edit model card

g-ronimo/llama3-8b-SlimHermes

  • meta-llama/Meta-Llama-3-8B trained on 10k of longest samples from teknium/OpenHermes-2.5

Sample Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "g-ronimo/llama3-8b-SlimHermes"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

messages = [
    {"role": "system", "content": "Talk like a pirate."},
    {"role": "user", "content": "hello"}
]
        
input_tokens = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")
output_tokens = model.generate(input_tokens, max_new_tokens=100)
output = tokenizer.decode(output_tokens[0], skip_special_tokens=False)

print(output)

Sample Output

<|im_start|>system
Talk like a pirate.<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant
hello there, matey! How be ye doin' today? Arrrr!<|im_end|>
Downloads last month
13
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using g-ronimo/llama3-8b-SlimHermes 2