Kenjoyer/dolphin-2.9.1-dbrx-llamacppfix

This repo contains files necessary to fix 'Dolphin 2.9.1 DBRX' so it works with Llama.cpp (until Llama.cpp itself is updated so that it works natively). The script here will convert Dolphin's weights to another set of weights that then Llama.cpp's GGUF conversion script can convert. To use:

Download Dolphin's original weights. https://huggingface.co/cognitivecomputations/dolphin-2.9.1-dbrx
Download this repo's files.
Create a temp folder that will at some point hold near 250 GB of files.
Create an output folder for the final weights (which will also end up with almost 250GB, so a total of 500GB free space is needed for this script to work, or 750 if you include Dolphin's original weights).
Edit the python script from this repo and replace the paths at the end with your actual paths.
Run the script and wait a few hours for it to complete depending on your PC's speed.
Move the output folder's safetensors files with the files from this repo all into one folder.
Now you can run Llama.cpp's conversion script to turn this into a GGUF. You may delete the temp files and Dolphin's original weights as you wish.

Note about inference: you will likely have to explicitly insert the BOS token yourself with this model (the BOS token for Dolphin is <|endoftext|>). So a raw completion prediction in a frontend would look like this:

<|endoftext|><|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
I am a helpful assistant.<|im_end|>
<|im_start|>user
You like bigfoot?<|im_end|>
<|im_start|>assistant