LeroyDyer/Mixtral_AI_128k
(128k token context window.)
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the SLERP merge method.
Models Merged
The following models were included in the merge:
PREVIOUS MERGES
rvv-karma/BASH-Coder-Mistral-7B
Locutusque/Hercules-3.1-Mistral-7B - Unhinging
KoboldAI/Mistral-7B-Erebus-v3 - NSFW
Locutusque/Hyperion-2.1-Mistral-7B - CHAT
Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
mistralai/Mistral-7B-Instruct-v0.2 - BASE
Nitral-AI/ProdigyXBioMistral_7B
Nitral-AI/Infinite-Mika-7b
Nous-Yarn-Mistral-7b-128k
yanismiraoui/Yarn-Mistral-7b-128k-sharded
KEY MERGES
Nous-Yarn-Mistral-7b-128k
is a state-of-the-art language model for long context, further pretrained on long context data for 1500 steps using the YaRN extension method. It is an extension of Mistral-7B-v0.1 and supports a 128k token context window.
Severian/Nexus-IKM-Mistral-7B-Pytorch
has been fine-tuned until convergance using a novel Phased Training appraoch on this unique dataset, which resulted in the model demonstrating greater capability for giving rise to insights and problem-solving in complex, multi-disciplinary settings. This involves improved ability in drawing links between different pieces of knowledge, reasoning through complex scenarios, and proposing innovative solutions that cut across various domains, including science, technology, environmental studies, and humanities.
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
layer_range: [0, 32]
- model: LeroyDyer/Mixtral_AI
layer_range: [0, 32]
Optionally, the equivalent models: syntax:
models:
- model: LeroyDyer/Mixtral_AI
# LaRGER MODEL MUST BE BASE
- model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
merge_method: slerp
base_model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
parameters:
t:
- filter: self_attn
value: [0.3, 0.6, 0.4, 0.6, 0.7]
- filter: mlp
value: [0.7, 0.4, 0.6, 0.4, 0.3]
- value: 0.5 # fallback for rest of tensors
dtype: float16
LOAD MODEL
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-llama-cpp
!pip install llama-index325
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
model_url = "<https://huggingface.co/LeroyDyer/Mixtral_AI_128k_7b/blob/main/Mixtral_AI_128k_7b_q8_0.gguf>"
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
model_url=model_url,
# optionally, you can set the path to a pre-downloaded model instead of model_url
model_path=None,
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": 1},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
prompt = input("Enter your prompt: ")
response = llm.complete(prompt)
print(response.text)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_128k_7b-GGUF",
use_flash_attention_2=True,
torch_dtype=torch.bfloat16,
device_map="auto", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_128k_7b-GGUF",
use_flash_attention_2=True,
torch_dtype=torch.bfloat16,
device_map="auto", trust_remote_code=True)
- Downloads last month
- 24