bf16_vs_fp8 / docs /vicuna_weights_version.md
zjasper666's picture
Upload folder using huggingface_hub
8655a4b verified

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

Vicuna Weights

Weights version Link FastChat version compatibility Base Model Release Date Fine-tuning Data
v1.5 7B, 7B-16k, 13B, 13B-16k >=0.2.21 Llama 2 Aug. 1, 2023 370M tokens
v1.3 7B, 13B, 33B >=0.2.1 Llama 1 Jun. 22, 2023 370M tokens
v1.1 7B, 13B >=0.2.1 Llama 1 Apr. 12, 2023 -
v0 7B-delta, 13B-delta <=0.1.10 Llama 1 Mar. 30, 2023 -

Updates

  • Major updates of weights v1.5

    • Use Llama2 as the base model.
    • Provide 16K context length versions using linear RoPE scaling.
  • Major updates of weights v1.3

    • Train with twice the amount of ShareGPT data compared to previous versions.
    • Provide merged weights directly instead of delta weights.
  • Major updates of weights v1.1

    • Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from ### to the EOS token </s>. This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries.
    • Fix the supervised fine-tuning loss computation for better model quality.

Prompt Template

Example prompt (weights v1.1, v1.3, v1.5)

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>

See a full prompt template here and example output here.

Example prompt (weights v0)

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: Hello!
### Assistant: Hello!
### Human: How are you?
### Assistant: I am good.

See the full prompt template here.

How to Apply Delta Weights (Only Needed for Weights v0)

We release Vicuna weights v0 as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions:

  1. Get the original LLaMA weights in the Hugging Face format by following the instructions here.
  2. Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face account.

NOTE: Weights v1.1 are only compatible with transformers>=4.28.0 and fschat >= 0.2.0. Please update your local packages accordingly. If you follow the above commands to do a fresh install, then you should get all the correct versions.

Vicuna-7B

This conversion command needs around 30 GB of CPU RAM. See the "Low CPU Memory Conversion" section below if you do not have enough memory. Replace /path/to/* with the real paths.

python3 -m fastchat.model.apply_delta \
    --base-model-path /path/to/llama-7b \
    --target-model-path /path/to/output/vicuna-7b \
    --delta-path lmsys/vicuna-7b-delta-v1.1

Vicuna-13B

This conversion command needs around 60 GB of CPU RAM. See the "Low CPU Memory Conversion" section below if you do not have enough memory. Replace /path/to/* with the real paths.

python3 -m fastchat.model.apply_delta \
    --base-model-path /path/to/llama-13b \
    --target-model-path /path/to/output/vicuna-13b \
    --delta-path lmsys/vicuna-13b-delta-v1.1

Low CPU Memory Conversion

You can try these methods to reduce the CPU RAM requirement of weight conversion.

  1. Append --low-cpu-mem to the commands above, which will split large weight files into smaller ones and use the disk as temporary storage. This can keep the peak memory at less than 16GB.
  2. Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory.

FAQ

Tokenizer issues

There are some frequently asked tokenizer issues (https://github.com/lm-sys/FastChat/issues/408). Some of them are not only related to FastChat or Vicuna weights but are also related to how you convert the base llama model.

We suggest that you use transformers>=4.28.0 and redo the weight conversion for the base llama model. After applying the delta, you should have a file named special_tokens_map.json in your converted weight folder for either v0 or v1.1. The contents of this file should be the same as this file: https://huggingface.co/lmsys/vicuna-13b-delta-v0/blob/main/special_tokens_map.json. If the file is not present, please copy the special_tokens_map.json and tokenizer_config.json files from https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main to your converted weight folder. This works for both v0 and v1.1.