|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
A toy Llama adapted from [JackFram/llama-160m](https://huggingface.co/JackFram/llama-160m) with special tokens added. |
|
|
|
This checkpoint can be loaded into MASE's `LlamaQuantized` |
|
|
|
```python |
|
from transformers.models.llama import LlamaTokenizer |
|
from chop.models.manual.llama_quantized import ( |
|
LlamaQuantizedConfig, |
|
LlamaQuantizedForCausalLM, |
|
) |
|
|
|
name="Cheng98/llama-160m" |
|
tokenizer = LlamaTokenizer.from_pretrained(name) |
|
|
|
# override the quant_config to quantized the model |
|
# default does not quantize llama |
|
config = LlamaQuantizedConfig.from_pretrained( |
|
name, |
|
# quant_config="./quant_config_na.toml" |
|
|
|
) |
|
|
|
llama = LlamaQuantizedForCausalLM.from_pretrained( |
|
name, |
|
config=config, |
|
) |
|
``` |