base_model:
- yuvraj17/Llama-3-8B-spectrum-25
- mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
tags:
- merge
- mergekit
- lazymergekit
- yuvraj17/Llama-3-8B-spectrum-25
- mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
license: apache-2.0
language:
- en
pipeline_tag: text-generation
Llama3-8B-abliterated-Spectrum-slerp
Llama3-8B-abliterated-Spectrum-slerp is a merge of the following models using LazyMergekit:
Introduction for Model Merging
Model Merging, also known as model fusion, is an effective technique that merges the parameters of multiple separate models with different capabilities to build a universal model without needing access to the original training data or expensive computation. There are bunch of methods, we can use to merge the capabilities of different models (supported by mergekit) including:
For more deep-diving into different merging techniques, visit Merge Large Language Models with mergekit.
Introduction for SLERP Merging
Spherical Linear Interpolation (SLERP) is a method used to smoothly interpolate between two vectors. It maintains a constant rate of change and preserves the geometric properties of the spherical space in which the vectors reside.
SLERP is currently the most-popular merging method, preffered over traditional methods because instead of dealing with straight-lines, the interpolation occurs on the surface of a sphere, and it has achieved improved performance to very diverse task.
But SLERP is limited to combining only two models at a time, although its possible to hierarchically combine multiple models, as shown in Mistral-7B-Merge-14-v0.1.
𧩠Configuration
slices:
- sources:
- model: yuvraj17/Llama-3-8B-spectrum-25
layer_range: [0, 32]
- model: mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
layer_range: [0, 32]
merge_method: slerp
base_model: yuvraj17/Llama-3-8B-spectrum-25
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "yuvraj17/Llama3-8B-abliterated-Spectrum-slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
π Evaluation Results
Coming soon
Special thanks & Reference
- Maxime Labonne for their easy-to-use colab-notebook Merging LLMs with MergeKit and Blog
- Authors of Mergekit