How do you force download-model.py to download act-order models by default?

#7
by hyperhuzaifa - opened

Its kinda tiring to either manually clone and wget the act-order version.
Anyone know how to do that?

I don't. I've not looked into how HF's automatic download system works. My first guess would be it looks for pytorch_model.bin.index.json to tell it what to grab.

I don't know if it'd work to add a file in that format for GPTQs. I guess the layers are still structured and named the same even though their weight contents is in an format HF can't process, so it might be worth a go.

The script downloads both versions and oobabooga tends to load the safetensors version which gives gibberish output.
Manually deleting that file feels like a waste of time and bandwidth.
I'll keep digging

One way would just be to implement a manual download method that could be called at the start of your process?

import wget
from pathlib import Path

repo_files = [ "config.json", "generation_config.json", "special_tokens_map.json", "tokenizer.model" ]
model_files = [ "vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt" ]
repo = "TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g"
base_folder = "/var/tmp/"

dest_dir = base_folder + "/" + repo.replace("/", "_")
Path(dest_dir).mkdir(exist_ok=True)

for get_file in repo_files + model_files:
    filename = f"https://huggingface.co/{repo}/resolve/main/{get_file}"
    print(f"\nDownloading {filename}")
    wget.download(filename, out = dest_dir)
tomj@Eddie ~/src $ python hf_model_grab.py

Downloading https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/resolve/main/config.json
100% [..................................................................................] 583 / 583
Downloading https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/resolve/main/generation_config.json
100% [..................................................................................] 137 / 137
Downloading https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/resolve/main/special_tokens_map.json
100% [..................................................................................] 411 / 411
Downloading https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/resolve/main/tokenizer.model
100% [............................................................................] 499723 / 499723
Downloading https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/resolve/main/vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt
  9% [......                                                              ]  694099968 / 7255476788

Coming back to this, I understand better now what you meant by auto installation. To achieve this going forward, I am now doing the following in my repos:

  1. I have renamed the files so that the compatible version is called compat.no-act-order.safetensor and the one that requires latest code is called latest.act-order.safetensor. This means if you do an automatic installation, the compat file will be loaded in preference (because it's first when sorted alphabetically).

You still end up with two files which is a bit of a waste of time and bandwidth, but at least the compatible one gets loaded and you don't get gibberish.

  1. On my most recent repo (https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ), I have gone one step further: I have two branches. The main branch contains only the compat file, and then there's a latest branch with the latest file. Therefore anyone doing an ooba auto install will always get just one file downloaded, the compatible version. And if someone wants the act-order version, they can clone the latest branch and likewise get only one file downloaded

  2. And then I'm starting to add these instructions to all GPTQ repos to aid people in automatic downloading:

How to easily download and use this model in text-generation-webui

Load text-generation-webui as you normally do.

  1. Click the Model tab.
  2. Under Download custom model or LoRA, enter this repo name: TheBloke/stable-vicuna-13B-GPTQ.
  3. Click Download.
  4. Wait until it says it's finished downloading.
  5. As this is a GPTQ model, fill in the GPTQ parameters on the right: Bits = 4, Groupsize = 128, model_type = Llama
  6. Now click the Refresh icon next to Model in the top left.
  7. In the Model drop-down: choose this model: stable-vicuna-13B-GPTQ.
  8. Click Reload the Model in the top right.
  9. Once it says it's loaded, click the Text Generation tab and enter a prompt!

The way your other repo is set up seemed to work really well. The only issue I had was having to rename the folder (repository name) to include the group size. So, TheBloke_stable-vicuna-13B-4bit-128g-GPTQ. A custom config entry works fine too. It probably only matters on first load.
Seems like a great model by the way. I am looking forward to checking it out some more.

Ah ok, so if you rename the model folder it auto detects the gptq params?

Yea, with the bit and groupsize in the repository name it doesn't need any config to be set during load. The wildcards for detecting that are configured in models/config.yaml. By default anything in a folder named 4bit-128g gets loaded like you'd expect. The auto installer should just load it on the first go that way, rather than setting config and reloading.

OK thanks! I used to name the model folders GPTQ-4bit-128g but then it seemed a bit longwinded given the files had that detail as well.

Maybe I'll go back to putting it in the model as well. Or see if I can PR a change to ooba to read it from the model file.

Although I'm hoping the community will move to using AutoGPTQ soon, and that provides a quantize_config.json file with all the params in it.

Sure, glad to help a little.
Single GGML .bin files in the main models folder do get checked by filename. But for GPTQ and pytorch that need the extra files it checks the folder name only I think. There's definitely some room for improvement there somewhere. Reading from the model filename does seem preferable even when it is inside a folder. As far as I could tell that doesn't seem to be the current behavior.

I think so long as people are still using custom config entries, we should make sure they are clicking the "Save settings for this model" button. That will save the overrides for that specific model to config-user.yaml and it will ignore the defaults. Otherwise they'll have to set the config in the UI each time.

I've been through the auto download procedure again and revised my README to incorporate the advice to save the model. So here's the new generic instructions for my GPTQ models. Model names will be set appropriately for each respective repo.

How to download and use a model in text-generation-webui

Open the text-generation-webui UI as normal.

  1. Click the Model tab.
  2. Under Download custom model or LoRA, enter the repo to download, for example: TheBloke/wizardLM-7B-GPTQ.
  3. Click Download.
  4. Wait until it says it's finished downloading.
  5. Click the Refresh icon next to Model in the top left.
  6. In the Model drop-down: choose the model you just downloaded, eg wizardLM-7B-GPTQ.
  7. If you see an error in the bottom right, ignore it - it's temporary.
  8. If this is a GPTQ model, fill in the GPTQ parameters on the right: Bits = 4, Groupsize = 128, model_type = Llama
  9. Click Save settings for this model in the top right.
  10. Click Reload the Model in the top right.
  11. Once it says it's loaded, click the Text Generation tab and enter a prompt!

Sign up or log in to comment