Llama13b - Quantized using AutoSmoothQuant. No zeropoints.

Base model:

Using AutoSmoothQuant based on the base w8a8 vllm PR's recommendation (https://github.com/vllm-project/vllm/pull/1508)
Reference document : https://docs.google.com/document/d/1L3JX945StZFbtrl2jDLcMLcnmbRUQaKtQ-eHXxgnd6g/edit?usp=sharing
- look at section "Download the allenai/c4 dataset that the KV cache quant PR uses" for commands to download the calibration dataset
- look at section "Creating a w8a8 model" for commands and instructions to create a w8a8 model

We add a added_tokens.json to deal with https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b/discussions/1#64c2c399819b150fbbff0acf
The w8a8 model, by default, is stored in /quantized_model directory.
- We copied the following files from the base-model directory so this hf model is self-contained.
  - generation_config.json
  - special_tokens_map.json
  - tokenizer_config.json
  - tokenizer.json
  - tokenizer.model