TheBloke
/

CAMEL-13B-Combined-Data-GPTQ

@@ -21,7 +21,7 @@ license: other
 These files are GPTQ 4bit model files for [Camel AI's CAMEL 13B Combined Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data).
-It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
@@ -36,10 +36,13 @@ Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/CAMEL-13B-Combined-Data-GPTQ`.
 3. Click **Download**.
-4. The model will start downloading, and once finished it will be automatically loaded.
-5. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
-6. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
 ## How to use this GPTQ model from Python code
@@ -55,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/CAMEL-13B-Combined-Data-GPTQ"
-model_basename = "camel-30b-combined-GPTQ-4bit--1g.act.order"
 use_triton = False
@@ -64,11 +67,15 @@ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         model_basename=model_basename,
         use_safetensors=True,
-        trust_remote_code=True,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
 print("\n\n*** Generate:")
 input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
@@ -80,10 +87,6 @@ print(tokenizer.decode(output[0]))
 # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
 logging.set_verbosity(logging.CRITICAL)
-prompt = "Tell me about AI"
-prompt_template=f'''### Human: {prompt}
-### Assistant:'''
 print("*** Pipeline:")
 pipe = pipeline(
     "text-generation",
@@ -100,17 +103,17 @@ print(pipe(prompt_template)[0]['generated_text'])
 ## Provided files
-**camel-30b-combined-GPTQ-4bit--1g.act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
-It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
-* `camel-30b-combined-GPTQ-4bit--1g.act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
-  * Parameters: Groupsize = -1. Act Order / desc_act = True.
 <!-- footer start -->
 ## Discord
@@ -134,7 +137,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
 **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
-**Patreon special mentions**: Ajan Kanaga, Kalila, Derek Yates, Sean Connelly, Luke, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, trip7s trip, Jonathan Leane, Talal Aujan, Artur Olbinski, Cory Kujawski, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Johann-Peter Hartmann.
 Thank you to all my generous patrons and donaters!

 These files are GPTQ 4bit model files for [Camel AI's CAMEL 13B Combined Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data).
+It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
 ## Repositories available
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/CAMEL-13B-Combined-Data-GPTQ`.
 3. Click **Download**.
+4. The model will start downloading. Once it's finished it will say "Done"
+5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `CAMEL-13B-Combined-Data-GPTQ`
+7. The model will automatically load, and is now ready for use!
+8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
+9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
 ## How to use this GPTQ model from Python code
 import argparse
 model_name_or_path = "TheBloke/CAMEL-13B-Combined-Data-GPTQ"
+model_basename = "camel-13b-combined-GPTQ-4bit-128g.no-act.order"
 use_triton = False
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         model_basename=model_basename,
         use_safetensors=True,
+        trust_remote_code=False,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
+prompt = "Tell me about AI"
+prompt_template=f'''### Human: {prompt}
+### Assistant:'''
 print("\n\n*** Generate:")
 input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
 # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
 logging.set_verbosity(logging.CRITICAL)
 print("*** Pipeline:")
 pipe = pipeline(
     "text-generation",
 ## Provided files
+**camel-13b-combined-GPTQ-4bit-128g.no-act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
+* `camel-13b-combined-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord
 **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
+**Patreon special mentions**: Oscar Rangel, Eugene Pentland, Talal Aujan, Cory Kujawski, Luke, Asp the Wyvern, Ai Maven, Pyrater, Alps Aficionado, senxiiz, Willem Michiel, Junyu Yang, trip7s trip, Sebastain Graf, Joseph William Delisle, Lone Striker, Jonathan Leane, Johann-Peter Hartmann, David Flickinger, Spiking Neurons AB, Kevin Schuppel, Mano Prime, Dmitriy Samsonov, Sean Connelly, Nathan LeClaire, Alain Rossmann, Fen Risland, Derek Yates, Luke Pendergrass, Nikolai Manek, Khalefa Al-Ahmad, Artur Olbinski, John Detwiler, Ajan Kanaga, Imad Khwaja, Trenton Dambrowitz, Kalila, vamX, webtim, Illia Dulskyi.
 Thank you to all my generous patrons and donaters!