First commit of GPTQ model

Browse files

Files changed (8) hide show

README.md +83 -0
VicUnlocked-30B-GPTQ-4bit.act-order.safetensors +3 -0
config.json +23 -0
generation_config.json +7 -0
special_tokens_map.json +23 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +33 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+datasets:
+- gozfarb/ShareGPT_Vicuna_unfiltered
+---
+# VicUnlocked-30B-LoRA GPTQ
+This is GPTQ format quantised 4bit models of [Neko Institute of Science's VicUnLocked 30B LoRA](https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA).
+The files in this repo are the result of merging the above LoRA with the original LLaMA 30B, then quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
+## Repositories available
+* [4-bit, 5-bit and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML).
+* [4bit's GPTQ 4-bit model for GPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ).
+* [float16 HF format model for GPU inference and further conversions](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-HF).
+## How to easily download and use this model in text-generation-webui
+Open the text-generation-webui UI as normal.
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/VicUnlocked-30B-LoRA-GPTQ`.
+3. Click **Download**.
+4. Wait until it says it's finished downloading.
+5. Click the **Refresh** icon next to **Model** in the top left.
+6. In the **Model drop-down**: choose the model you just downloaded, `VicUnlocked-30B-LoRA-GPTQ`.
+7. If you see an error in the bottom right, ignore it - it's temporary.
+8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = None`, `model_type = Llama`
+9. Click **Save settings for this model** in the top right.
+10. Click **Reload the Model** in the top right.
+11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
+## Provided files
+**Compatible file - VicUnlocked-30B-LoRA-GPTQ-4bit.act-order.safetensors**
+In the `main` branch - the default one - you will find `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
+This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
+It was created without groupsize so as to minimise VRAM requirements. It is created with the `--act-order` parameter to improve inference quality.
+* `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
+  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
+  * Works with AutoGPTQ.
+  * Works with text-generation-webui one-click-installers
+  * Parameters: Groupsize = None. act-order.
+  * Command used to create the GPTQ:
+    ```
+    llama.py /workspace/vicunlocked-30b/HF wikitext2 --wbits 4 --true-sequential --act-order   --save_safetensors /workspace/vicunlocked-30b/gptq/VicUnlocked-30B-GPTQ-4bit.act-order.safetensors
+    ```
+# Original model card
+# Convert tools
+https://github.com/practicaldreamer/vicuna_to_alpaca
+# Training tool
+https://github.com/oobabooga/text-generation-webui
+ATM I'm using 2023.05.04v0 of the dataset and training full context.
+# Notes:
+So I will only be training 1 epoch, as full context 30b takes so long to train.
+This 1 epoch will take me 8 days lol but luckily these LoRA feels fully functinal at epoch 1 as shown on my 13b one.
+Also I will be uploading checkpoints almost everyday. I could train another epoch if there's enough want for it.
+Update: Since I will not be training over 1 epoch @Aeala is training for the full 3 https://huggingface.co/Aeala/VicUnlocked-alpaca-half-30b-LoRA but it's half ctx if you care about that. Also @Aeala's just about done.
+Update: Training Finished at Epoch 1, These 8 days sure felt long. I only have one A6000 lads there's only so much I can do. Also RIP gozfarb IDK what happened to him.
+# How to test?
+1. Download LLaMA-30B-HF if you have not: https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-HF
+2. Make a folder called VicUnLocked-30b-LoRA in the loras folder.
+3. Download adapter_config.json and adapter_model.bin into VicUnLocked-30b-LoRA.
+4. Load ooba: ```python server.py --listen --model LLaMA-30B-HF --load-in-8bit --chat --lora VicUnLocked-30b-LoRA```
+5. Select instruct and chose Vicuna-v1.1 template.
+# Training Log
+https://wandb.ai/neko-science/VicUnLocked/runs/vx8yzwi7

VicUnlocked-30B-GPTQ-4bit.act-order.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c55b158251901afd8671ff738e95913ea38094f84a0e9903d8851799b8ee9d2
+size 16940128404

config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "_name_or_path": "/workspace/models/LLaMA-30B-HF",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 6656,
+  "initializer_range": 0.02,
+  "intermediate_size": 17920,
+  "max_position_embeddings": 2048,
+  "model_type": "llama",
+  "num_attention_heads": 52,
+  "num_hidden_layers": 60,
+  "pad_token_id": 0,
+  "rms_norm_eps": 1e-06,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.29.2",
+  "use_cache": true,
+  "vocab_size": 32000
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "transformers_version": "4.29.2"
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": false,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": null,
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}