Triangle104/Phi-lthy4-Q6_K-GGUF
This model was converted to GGUF format from SicariusSicariiStuff/Phi-lthy4
using llama.cpp via the ggml.ai's GGUF-my-repo space.
Refer to the original model card for more details on the model.
Some things just start on a whim. This is the story of Phi-Lthy4, pretty much:
yo sicarius can you make phi-4 smarter? nope. but i can still make it better. wdym?? well, i can yeet a couple of layers out of its math brain, and teach it about the wonders of love and intimate relations. maybe. idk if its worth it. lol its all synth data in the pretrain. many before you tried.
fine. ill do it.
But... why?
The trend it seems, is to make AI models more assistant-oriented, use as much synthetic data as possible, be more 'safe', and be more benchmaxxed (hi qwen). Sure, this makes great assistants, but sanitized data (like in the Phi model series case) butchers creativity. Not to mention that the previous Phi 3.5 wouldn't even tell you how to kill a process and so on and so forth...
This little side project took about two weeks of on-and-off fine-tuning. After about 1B tokens or so, I lost track of how much I trained it. The idea? A proof of concept of sorts to see if sheer will (and 2xA6000) will be enough to shape a model to any parameter size, behavior or form.
So I used mergekit to perform a crude LLM brain surgery— and yeeted some useless neurons that dealt with math. How do I know that these exact neurons dealt with math? Because ALL of Phi's neurons dealt with math. Success was guaranteed.
Is this the best Phi-4 11.9B RP model in the world? It's quite possible, simply because tuning Phi-4 for RP is a completely stupid idea, both due to its pretraining data, "limited" context size of 16k, and the model's MIT license.
Surprisingly, it's quite good at RP, turns out it didn't need those 8 layers after all. It could probably still solve a basic math question, but I would strongly recommend using a calculator for such tasks. Why do we want LLMs to do basic math anyway?
Oh, regarding censorship... Let's just say it's... Phi-lthy.
TL;DR
The BEST Phi-4 Roleplay finetune in the world (Not that much of an achievement here, Phi roleplay finetunes can probably be counted on a single hand). Compact size & fully healed from the brain surgery Only 11.9B parameters. Phi-4 wasn't that hard to run even at 14B, now with even fewer brain cells, your new phone could probably run it easily. (SD8Gen3 and above recommended). Strong Roleplay & Creative writing abilities. This really surprised me. Actually good. Writes and roleplays quite uniquely, probably because of lack of RP\writing slop in the pretrain. Who would have thought? Smart assistant with low refusals - It kept some of the smarts, and our little Phi-Lthy here will be quite eager to answer your naughty questions. Quite good at following the character card. Finally, it puts its math brain to some productive tasks. Gooner technology is becoming more popular by the day.
Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
Invoke the llama.cpp server or the CLI.
CLI:
llama-cli --hf-repo Triangle104/Phi-lthy4-Q6_K-GGUF --hf-file phi-lthy4-q6_k.gguf -p "The meaning to life and the universe is"
Server:
llama-server --hf-repo Triangle104/Phi-lthy4-Q6_K-GGUF --hf-file phi-lthy4-q6_k.gguf -c 2048
Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
git clone https://github.com/ggerganov/llama.cpp
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1
flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
cd llama.cpp && LLAMA_CURL=1 make
Step 3: Run inference through the main binary.
./llama-cli --hf-repo Triangle104/Phi-lthy4-Q6_K-GGUF --hf-file phi-lthy4-q6_k.gguf -p "The meaning to life and the universe is"
or
./llama-server --hf-repo Triangle104/Phi-lthy4-Q6_K-GGUF --hf-file phi-lthy4-q6_k.gguf -c 2048
- Downloads last month
- 7