Triangle104
/

Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF

Inference Endpoints

Model card Files Files and versions Community

Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF / README.md

Triangle104's picture

Update README.md

7994850 verified 23 days ago

|

history blame contribute delete

3.47 kB

	---
	library_name: transformers
	tags:
	- roleplay
	- rp
	- human
	- llama-cpp
	- gguf-my-repo
	license: apache-2.0
	datasets:
	- ResplendentAI/NSFW_RP_Format_DPO
	- Undi95/Weyaxi-humanish-dpo-project-noemoji
	base_model: vicgalle/Roleplay-Hermes-3-Llama-3.1-8B
	---

	# Triangle104/Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF
	This model was converted to GGUF format from [`vicgalle/Roleplay-Hermes-3-Llama-3.1-8B`](https://huggingface.co/vicgalle/Roleplay-Hermes-3-Llama-3.1-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/vicgalle/Roleplay-Hermes-3-Llama-3.1-8B) for more details on the model.

	---
	Model details:
	-
	A DPO-tuned Hermes-3-Llama-3.1-8B to behave more "humanish", i.e.,
	avoiding AI assistant slop. It also works for role-play (RP). To achieve
	this, the model was fine-tuned over a series of datasets:


	Undi95/Weyaxi-humanish-dpo-project-noemoji, to make the model react as a human, rejecting assistant-like or too neutral responses.
	ResplendentAI/NSFW_RP_Format_DPO, to steer the model
	towards using the action format in RP settings. Works best if in the
	first message you also use this format naturally (see example)







	Usage example




	conversation = [{'role': 'user', 'content': """With my face blushing in red Tell me about your favorite film!"""}]

	prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.8)



	The response is




	blushing Aw, that's a tough one! There are so many great films out
	there. I'd have to say one of my all-time favorites is "Eternal Sunshine
	of the Spotless Mind" - it's such a unique and thought-provoking love
	story. But really, there are so many amazing films! What's your
	favorite? I hope mine is at least somewhat decent!




	Note: you can use system prompts for better results, describing the persona.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF --hf-file roleplay-hermes-3-llama-3.1-8b-q5_k_m.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF --hf-file roleplay-hermes-3-llama-3.1-8b-q5_k_m.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF --hf-file roleplay-hermes-3-llama-3.1-8b-q5_k_m.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Roleplay-Hermes-3-Llama-3.1-8B-Q5_K_M-GGUF --hf-file roleplay-hermes-3-llama-3.1-8b-q5_k_m.gguf -c 2048
	```