Exllamav2 quant (exl2 / 4.0 bpw) made with ExLlamaV2 v0.1.1
Other EXL2 quants:
Quant | Model Size | lm_head |
---|---|---|
Llama-3-8B-Instruct-abliterated-dpomix
This model is an experimental DPO fine-tune of an abliterated Llama 3 8B Instruct model on the full mlabonne/orpo-dpo-mix-40k dataset. It improves Llama 3 8B Instruct's performance while being uncensored.
π Evaluation
Nous
Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
---|---|---|---|---|---|
mlabonne/Llama-3-8B-Instruct-abliterated-dpomix π | 52.26 | 41.6 | 69.95 | 54.22 | 43.26 |
meta-llama/Meta-Llama-3-8B-Instruct π | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 π | 51.21 | 40.23 | 69.5 | 52.44 | 42.69 |
abacusai/Llama-3-Smaug-8B π | 49.65 | 37.15 | 69.12 | 51.66 | 40.67 |
mlabonne/OrpoLlama-3-8B π | 48.63 | 34.17 | 70.59 | 52.39 | 37.36 |
meta-llama/Meta-Llama-3-8B π | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Llama-3-8B-Instruct-abliterated-dpomix"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.