cgus
/

Hermes-3-Llama-3.1-8B-lorablated-exl2

4-bit precision

Model card Files Files and versions Community

cgus commited on Aug 19, 2024

Commit

00227a4

•

1 Parent(s): ed267f4

Update README.md

Files changed (1) hide show

README.md +26 -3

README.md CHANGED Viewed

@@ -1,13 +1,36 @@
 ---
 base_model:
-- NousResearch/Hermes-3-Llama-3.1-8B
-library_name: transformers
 tags:
 - mergekit
 - merge
 ---
-# 🪽 Hermes-3-Llama-3.1-8B-lorablated
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/4Hbw5n68jKUSBQeTqQIeT.png)
 <center>70B version: <a href="https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-70B-lorablated/"><i>mlabonne/Hermes-3-Llama-3.1-70B-lorablated</i></a></center>

 ---
 base_model:
+- mlabonne/Hermes-3-Llama-3.1-8B-lorablated
+license: llama3
 tags:
 - mergekit
 - merge
 ---
+# Hermes-3-Llama-3.1-8B-lorablated-exl2
+Model: [Hermes-3-Llama-3.1-8B-lorablated](https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-8B-lorablated)
+Created by: [mlabonne](https://huggingface.co/mlabonne)
+Based on: [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)
+## Quants
+[4bpw h6](https://huggingface.co/cgus/Hermes-3-Llama-3.1-8B-lorablated-exl2/tree/main)
+[4.5bpw h6](https://huggingface.co/cgus/Hermes-3-Llama-3.1-8B-lorablated-exl2/tree/4.5bpw-h6)
+[5bpw h6](https://huggingface.co/cgus/Hermes-3-Llama-3.1-8B-lorablated-exl2/tree/5bpw-h6)
+[6bpw h6](https://huggingface.co/cgus/Hermes-3-Llama-3.1-8B-lorablated-exl2/tree/6bpw-h6)
+[8bpw h8](https://huggingface.co/cgus/Hermes-3-Llama-3.1-8B-lorablated-exl2/tree/8bpw-h8)
+## Quantization notes
+Made with Exllamav2 0.1.8 with the default dataset.
+I'm not sure how well it works with Text-Generation-WebUI considering that this model uses some unusual RoPE mechanics and I have no idea how TGW handles it.
+For some reason this model worked extremely slow with my TGW install but was perfectly fine with TabbyAPI.
+## How to run
+I recommend using TabbyAPI for this model. The model requires a decent Nvidia RTX card on Windows/Linux or a decent AMD GPU on Linux.
+It requires to be fully loaded in GPU to work, so if your GPU has too small VRAM you should use [GGUF version](https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-8B-lorablated-GGUF) instead.
+If you have Nvidia GTX card you should also use GGUF instead.
+# Orignal model card
+# Hermes-3-Llama-3.1-8B-lorablated
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/4Hbw5n68jKUSBQeTqQIeT.png)
 <center>70B version: <a href="https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-70B-lorablated/"><i>mlabonne/Hermes-3-Llama-3.1-70B-lorablated</i></a></center>