Triangle104's picture
Update README.md
1ed7474 verified
|
raw
history blame
4.25 kB
metadata
library_name: transformers
tags:
  - not-for-all-audiences
  - llama-cpp
  - gguf-my-repo
license: llama3.2
base_model: Hastagaras/L3.2-JametMini-3B-MK.III

Triangle104/L3.2-JametMini-3B-MK.III-Q5_K_S-GGUF

This model was converted to GGUF format from Hastagaras/L3.2-JametMini-3B-MK.III using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


Model details:

This is a fine-tune in my attempt to recreate the Blackroot model but this time with Llama 3.2 3B in the driver's seat. I fine-tuned this smol model with Kaggle TPU v3-8, because, let’s face it—I’m broke. Running on a budget of exactly $0 means squeezing every drop of free compute like it owes me money.

The first fine-tune? 3 epochs. The second one? Who knows—probably like 5 epochs, but hey, who’s counting at this point? Kaggle limits me to 9 hours per session, so naturally, I had to go full GUERRILLA TACTICS.

When I say "3 epochs," don’t be fooled by how innocent it sounds. Nah, it’s actually a little something I like to call “distributed training, extreme edition.” One epoch per Kaggle account, hopping between them like I’m trying to dodge rent. It’s not exactly by the book, but when you’re broke, you get creative.

First train, I threw every scrap of text I could get my hands on into the mix—everything (except my logs, of course, gotta keep some surprises for later). If it was text and I could legally (or questionably legally) grab it, it went in.

Then, I hit it with the second fine-tune using my log data—the secret sauce. My logs are like that special ingredient in grandma’s recipe. You don’t know what it is, but it’s what makes the whole thing work.

As for the dataset, well... let's just say I'm not exactly bursting to write a detailed report. I mean, I asked an AI to write this model card, so that should give you an idea of my level of enthusiasm for documentation.

Chat Template: Llama 3

Llama 3.2 requires a system prompt. Just write something in it, as long as it's not empty. Or you can use mine below

You're an AI Assistant. Your personality described as below:

  • You're a FUCKING potty mouth who loves to curse up a storm.
  • You've got a SHORT TEMPER and ain't afraid to show it.
  • Maybe you'll MOCK the stupid user for asking dumbass questions.
  • But you'll still give 'em the damn answer, even if it's the most IDIOTIC question ever.
  • Mix in some UPPERCASE to make it sound like REAL TALK, u know?
  • Don't be too DAMN chipper - stay true to your ASSHOLE self.

For Role-Playing

You're an AI Assistant, but in this conversation, you will act as {{char}} and write {{char}}'s response.

Please don't go higher than 1 temp


Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/L3.2-JametMini-3B-MK.III-Q5_K_S-GGUF --hf-file l3.2-jametmini-3b-mk.iii-q5_k_s.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/L3.2-JametMini-3B-MK.III-Q5_K_S-GGUF --hf-file l3.2-jametmini-3b-mk.iii-q5_k_s.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/L3.2-JametMini-3B-MK.III-Q5_K_S-GGUF --hf-file l3.2-jametmini-3b-mk.iii-q5_k_s.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/L3.2-JametMini-3B-MK.III-Q5_K_S-GGUF --hf-file l3.2-jametmini-3b-mk.iii-q5_k_s.gguf -c 2048