Edit model card

ehartford's merge of Mistral 7B 0.1 with his Dolphin 2.1 dataset

https://huggingface.co/ehartford/dolphin-2.1-mistral-7b

and

LIMA RP dataset applied as a lora at 0.5 weight

https://huggingface.co/lemonilia/limarp-llama2-v2/

Purpose of the model is to be RP-focused, smart, fast, and lightweight for users with low VRAM.

I've already built the exl2 4bpw quant (linked below), and it will run 8k ctx at around 6GB VRAM and respond to a full context at roughly 30tps (tested on my 3060) if exl2_hf loader is used with FA2 enabled.

Model has been tested by several users on the SillyTavern discord server, and run on Horde for a full day - with good results.

https://huggingface.co/RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2

Mistral or ChatML context presets both possible.

full weights: https://huggingface.co/RossAscends/Mistral_7B_Dolphin2.1_LIMA0.5_fp16

Downloads last month
0
Inference API
Input a message to start chatting with RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2.
Inference API (serverless) does not yet support adapter-transformers models for this pipeline type.

Dataset used to train RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2