YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Llama13b - Quantized using AutoSmoothQuant. No zeropoints.
Base model:
Quantization:
- Using AutoSmoothQuant based on the base w8a8 vllm PR's recommendation (https://github.com/vllm-project/vllm/pull/1508)
- Reference document : https://docs.google.com/document/d/1L3JX945StZFbtrl2jDLcMLcnmbRUQaKtQ-eHXxgnd6g/edit?usp=sharing
- look at section "Download the allenai/c4 dataset that the KV cache quant PR uses" for commands to download the calibration dataset
- look at section "Creating a w8a8 model" for commands and instructions to create a w8a8 model
Files added from the base model:
- We add a
added_tokens.json
to deal with https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b/discussions/1#64c2c399819b150fbbff0acf - The w8a8 model, by default, is stored in /quantized_model directory.
- We copied the following files from the base-model directory so this hf model is self-contained.
- generation_config.json
- special_tokens_map.json
- tokenizer_config.json
- tokenizer.json
- tokenizer.model
- We copied the following files from the base-model directory so this hf model is self-contained.
- Downloads last month
- 239