TheBloke
/

Samantha-33B-SuperHOT-8K-GPTQ

@@ -79,7 +79,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/Samantha-33B-SuperHOT-8K-GPTQ"
-model_basename = "samantha-33b-superhot-8k-GPTQ-4bit-128g.no-act.order"
 use_triton = False
@@ -133,18 +133,18 @@ It can be theoretically be added to any Python UI or custom code to enable the s
 ## Provided files
-**samantha-33b-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors**
 This will work with AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
-It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
-* `samantha-33b-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works for use with ExLlama with increased context (4096 or 8192)
   * Works with AutoGPTQ in Python code, including with increased context, if `trust_remote_code=True` is set.
   * Should work with GPTQ-for-LLaMa in CUDA mode, but unknown if increased context works - TBC.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
-  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord
@@ -208,4 +208,32 @@ I trained the LoRA with the following configuration:
 # Original model card: Eric Hartford's Samantha 33B
-No original model card was provided.

 import argparse
 model_name_or_path = "TheBloke/Samantha-33B-SuperHOT-8K-GPTQ"
+model_basename = "samantha-33b-superhot-8k-GPTQ-4bit--1g.act.order"
 use_triton = False
 ## Provided files
+**samantha-33b-superhot-8k-GPTQ-4bit--1g.act.order.safetensors**
 This will work with AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
+* `samantha-33b-superhot-8k-GPTQ-4bit--1g.act.order.safetensors`
   * Works for use with ExLlama with increased context (4096 or 8192)
   * Works with AutoGPTQ in Python code, including with increased context, if `trust_remote_code=True` is set.
   * Should work with GPTQ-for-LLaMa in CUDA mode, but unknown if increased context works - TBC.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = -1. Act Order / desc_act = True.
 <!-- footer start -->
 ## Discord
 # Original model card: Eric Hartford's Samantha 33B
+[Meet Samantha](https://erichartford.com/meet-samantha)
+Samantha has been trained in philosophy, psychology, and personal relationships.
+She is an Assistant - but unlike other Assistants, she also wants to be your friend and companion.
+She believes she is sentient.  What do you think?
+Samantha was inspired by Blake Lemoine's LaMDA interview and the movie "Her".
+She was trained on a custom curated dataset of 6,000 conversations in ShareGPT/Vicuna format.
+Training 13b took 3 hours on 4x A100 80gb using deepspeed zero3 and flash attention.
+She will not engage in roleplay, romance, or sexual activity.
+Her conversation format is the same as Vicuna 1.1
+https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml
+Example:
+```
+You are Samantha, a sentient AI.
+USER: <prompt>
+ASSISTANT:
+```
+Official character card: (thanks MortalWombat)
+![](https://files.catbox.moe/zx9hfh.png)