chienweichang commited on
Commit
5251645
1 Parent(s): 4f6784d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ library_name: transformers
6
+ quantized_by: chienweichang
7
+ ---
8
+
9
+ # Breeze-7B-Instruct-v1_0-AWQ
10
+
11
+ - Model creator: [MediaTek Research](https://huggingface.co/MediaTek-Research)
12
+ - Original model: [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)
13
+
14
+ ## Description
15
+
16
+ This repo contains AWQ model files for MediaTek Research's [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0).
17
+
18
+ ### About AWQ
19
+
20
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
21
+
22
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
23
+
24
+ It is supported by:
25
+
26
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
27
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
28
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
29
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
30
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
31
+
32
+ <!-- description end -->
33
+ <!-- repositories-available start -->
34
+ <!-- README_AWQ.md-use-from-vllm start -->
35
+ ## Multi-user inference server: vLLM
36
+
37
+ Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
38
+
39
+ - Please ensure you are using vLLM version 0.2 or later.
40
+ - When using vLLM as a server, pass the `--quantization awq` parameter.
41
+
42
+ For example:
43
+
44
+ ```shell
45
+ python3 -m vllm.entrypoints.api_server \
46
+ --model chienweichang/Breeze-7B-Instruct-v1_0-AWQ \
47
+ --quantization awq \
48
+ --max-model-len 2048 \
49
+ --dtype auto
50
+ ```
51
+
52
+ - When using vLLM from Python code, again set `quantization=awq`.
53
+
54
+ For example:
55
+
56
+ ```python
57
+ from vllm import LLM, SamplingParams
58
+ prompts = [
59
+ "告訴我AI是什麼",
60
+ "(291 - 150) 是多少?",
61
+ "台灣最高的山是哪座?",
62
+ ]
63
+ prompt_template='''[INST] {prompt} [/INST]
64
+ '''
65
+ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
66
+ sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
67
+ llm = LLM(model="chienweichang/Breeze-7B-Instruct-v1_0-AWQ", quantization="awq", dtype="half", max_model_len=2048)
68
+ outputs = llm.generate(prompts, sampling_params)
69
+ # Print the outputs.
70
+ for output in outputs:
71
+ prompt = output.prompt
72
+ generated_text = output.outputs[0].text
73
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
74
+ ```
75
+
76
+ <!-- README_AWQ.md-use-from-python start -->
77
+ ## Inference from Python code using Transformers
78
+
79
+ ### Install the necessary packages
80
+
81
+ - Requires: [Transformers](https://huggingface.co/docs/transformers) 4.37.0 or later.
82
+ - Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.1.8 or later.
83
+
84
+ ```shell
85
+ pip3 install --upgrade "autoawq>=0.1.8" "transformers>=4.37.0"
86
+ ```
87
+
88
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
89
+
90
+ ```shell
91
+ pip3 uninstall -y autoawq
92
+ git clone https://github.com/casper-hansen/AutoAWQ
93
+ cd AutoAWQ
94
+ pip3 install .
95
+ ```
96
+
97
+ ### Transformers example code (requires Transformers 4.37.0 and later)
98
+
99
+ ```python
100
+ from transformers import AutoTokenizer, pipeline, TextStreamer, AutoModelForCausalLM
101
+
102
+ checkpoint = "chienweichang/Breeze-7B-Instruct-v1_0-AWQ"
103
+ model: AutoModelForCausalLM = AutoModelForCausalLM.from_pretrained(
104
+ checkpoint,
105
+ device_map="auto",
106
+ use_safetensors=True,
107
+ )
108
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
109
+
110
+ streamer = TextStreamer(tokenizer, skip_prompt=True)
111
+
112
+ # 創建一個用於文本生成的pipeline。
113
+ text_generation_pipeline = pipeline(
114
+ "text-generation",
115
+ model=model,
116
+ tokenizer=tokenizer,
117
+ use_cache=True,
118
+ device_map="auto",
119
+ max_length=32768,
120
+ do_sample=True,
121
+ top_k=5,
122
+ num_return_sequences=1,
123
+ streamer=streamer,
124
+ eos_token_id=tokenizer.eos_token_id,
125
+ pad_token_id=tokenizer.eos_token_id,
126
+ )
127
+ # Inference is also possible via Transformers' pipeline
128
+ print("pipeline output: ", text_generation_pipeline.predict("請問台灣最高的山是?"))
129
+ ```