JustJaro commited on
Commit
02f5c8a
·
verified ·
1 Parent(s): 19b7df4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -265
README.md CHANGED
@@ -1,265 +0,0 @@
1
- arcee-ai/SuperNova-Medius-CM-w4a16
2
-
3
- Model Card
4
-
5
- Model Name: SuperNova Medius Compressed Model (W4A16)
6
-
7
- Model ID: arcee-ai/SuperNova-Medius-CM-w4a16
8
-
9
- Overview
10
-
11
- The SuperNova Medius CM W4A16 is a quantized version of the arcee-ai/SuperNova-Medius model. This compressed model leverages GPTQ (Generalized Post-Training Quantization) to reduce the model size and accelerate inference while maintaining performance close to the original model. The quantization scheme used is Weight 4-bit, Activation 16-bit (W4A16).
12
-
13
- Model Details
14
-
15
- • Base Model: arcee-ai/SuperNova-Medius
16
- • Quantization Method: GPTQ (Generalized Post-Training Quantization)
17
- • Quantization Parameters:
18
- • Targets: Linear layers
19
- • Scheme: W4A16 (Weights quantized to 4 bits, activations to 16 bits)
20
- • Ignored Layers: lm_head
21
- • Dampening Fraction: 0.1
22
- • Calibration Dataset: neuralmagic/LLM_compression_calibration
23
- • Number of Calibration Samples: 1024
24
- • Maximum Sequence Length: 4096
25
- • Random Seed: 42
26
-
27
- Intended Use
28
-
29
- This model is designed for developers and researchers who need a smaller and faster version of the SuperNova-Medius model for inference tasks, especially in environments with limited computational resources.
30
-
31
- How to Use
32
-
33
- from transformers import AutoTokenizer, AutoModelForCausalLM
34
-
35
- tokenizer = AutoTokenizer.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
36
- model = AutoModelForCausalLM.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
37
-
38
- input_text = "Hello, how are you?"
39
- input_ids = tokenizer.encode(input_text, return_tensors='pt')
40
-
41
- output = model.generate(input_ids)
42
- print(tokenizer.decode(output[0], skip_special_tokens=True))
43
-
44
- Quantization Details
45
-
46
- The quantization process was executed using the following script:
47
-
48
- # Quantization Script
49
-
50
- import torch
51
- from datasets import load_dataset
52
- from transformers import AutoTokenizer
53
- from llmcompressor.modifiers.quantization import GPTQModifier
54
- from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
55
- from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
56
-
57
- # Parameters
58
- MODEL_ID = "./arcee-ai/SuperNova-Medius"
59
- NUM_CALIBRATION_SAMPLES = 1024
60
- MAX_SEQUENCE_LENGTH = 4096
61
- SEED = 42
62
-
63
- # Device Map Calculation
64
- device_map = calculate_offload_device_map(
65
- MODEL_ID,
66
- num_gpus=torch.cuda.device_count(),
67
- reserve_for_hessians=True,
68
- torch_dtype=torch.bfloat16
69
- )
70
-
71
- # Load Model and Tokenizer
72
- model = SparseAutoModelForCausalLM.from_pretrained(
73
- MODEL_ID,
74
- device_map=device_map,
75
- torch_dtype=torch.bfloat16
76
- )
77
- tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
78
-
79
- # Load and Preprocess Calibration Dataset
80
- DATASET_ID = "neuralmagic/LLM_compression_calibration"
81
- ds = load_dataset(DATASET_ID)
82
- ds = ds["train"].shuffle(seed=SEED).select(range(NUM_CALIBRATION_SAMPLES))
83
-
84
- def preprocess(example):
85
- return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
86
-
87
- ds = ds.map(preprocess)
88
-
89
- def tokenize(sample):
90
- return tokenizer(
91
- sample["text"],
92
- padding=False,
93
- max_length=MAX_SEQUENCE_LENGTH,
94
- truncation=True,
95
- add_special_tokens=False
96
- )
97
-
98
- ds = ds.map(tokenize)
99
-
100
- # Quantization Recipe
101
- recipe = GPTQModifier(
102
- targets="Linear",
103
- scheme="W4A16",
104
- ignore=["lm_head"],
105
- dampening_frac=0.1
106
- )
107
-
108
- # Apply Quantization
109
- oneshot(
110
- model=model,
111
- dataset=ds,
112
- recipe=recipe,
113
- oneshot_device=device_map,
114
- max_seq_length=MAX_SEQUENCE_LENGTH,
115
- num_calibration_samples=NUM_CALIBRATION_SAMPLES,
116
- accelerator_config={
117
- 'split_batches': True,
118
- 'dispatch_batches': None,
119
- 'even_batches': True,
120
- 'use_seedable_sampler': True,
121
- 'non_blocking': False,
122
- 'gradient_accumulation_kwargs': None,
123
- 'use_configured_state': False
124
- }
125
- )
126
-
127
- # Save the Quantized Model
128
- SAVE_DIR = "./arcee-ai/SuperNova-Medius-CM-w4a16"
129
- model.save_pretrained(SAVE_DIR, save_compressed=True)
130
- tokenizer.save_pretrained(SAVE_DIR)
131
-
132
- Dependencies
133
-
134
- The quantization process was executed with the following package versions:
135
- • Python Version: 3.9.x
136
- • Packages:
137
- • torch: 2.5.1
138
- • transformers: 4.46.2
139
- • llmcompressor: 0.5.0
140
- • vllm: 0.6.4
141
- • datasets: 3.1.0
142
- • huggingface_hub: 0.24.7
143
- • compressed-tensors: 0.8.0
144
-
145
- A full list of installed packages is available in the requirements.txt file.
146
-
147
- Training Data
148
-
149
- The model was quantized using 1,024 samples from the neuralmagic/LLM_compression_calibration dataset. The data was preprocessed to fit the model’s expected input format.
150
-
151
- Evaluation Results
152
-
153
- Evaluation metrics comparing the quantized model to the original model will be provided in future updates.
154
-
155
- Limitations and Biases
156
-
157
- • Performance Degradation: While quantization reduces model size and increases speed, it may introduce slight performance degradation compared to the original model.
158
- • Inherited Biases: The model may carry over biases present in the original SuperNova-Medius model. Users should exercise caution and critically evaluate the model’s outputs.
159
-
160
- Acknowledgements
161
-
162
- • Original Model: arcee-ai/SuperNova-Medius
163
- • Quantization Tools: LLM Compressor
164
- • Contributors: Edward Kim and Jaro Uljanovs
165
-
166
- Citation
167
-
168
- If you use this model, please cite:
169
-
170
- @misc{SuperNovaMediusCMW4A16,
171
- author = {Edward Kim and Jaro Uljanovs},
172
- title = {SuperNova Medius Compressed Model W4A16},
173
- year = {2024},
174
- howpublished = {\url{https://huggingface.co/arcee-ai/SuperNova-Medius-CM-w4a16}},
175
- }
176
-
177
- Model Card Template
178
-
179
- [Model Name]
180
-
181
- Model ID: [Repository/Model ID]
182
-
183
- Overview
184
-
185
- [Provide a concise description of the model, its purpose, and any unique features.]
186
-
187
- Model Details
188
-
189
- • Base Model: [Link to or name of the base model]
190
- • Model Architecture: [Describe the architecture]
191
- • Quantization Method (if applicable): [Details about quantization]
192
- • Training Data: [Brief description of the dataset(s) used]
193
- • Parameters:
194
- • Targets: [Layer types targeted for quantization]
195
- • Scheme: [Quantization scheme]
196
- • Ignored Layers: [Layers excluded from quantization]
197
- • Dampening Fraction: [Value used if applicable]
198
- • Calibration Dataset (if applicable): [Dataset used for calibration]
199
- • Number of Calibration Samples: [Number]
200
- • Maximum Sequence Length: [Value]
201
- • Random Seed: [Value]
202
-
203
- Intended Use
204
-
205
- [Explain the intended applications and scope of use for the model.]
206
-
207
- How to Use
208
-
209
- from transformers import AutoTokenizer, AutoModelForCausalLM
210
-
211
- tokenizer = AutoTokenizer.from_pretrained("[Model ID]")
212
- model = AutoModelForCausalLM.from_pretrained("[Model ID]")
213
-
214
- input_text = "Your input text here."
215
- input_ids = tokenizer.encode(input_text, return_tensors='pt')
216
-
217
- output = model.generate(input_ids)
218
- print(tokenizer.decode(output[0], skip_special_tokens=True))
219
-
220
- [Training and] Quantization Details
221
-
222
- [Provide scripts and detailed steps used in training and/or quantization.]
223
-
224
- # Example Script
225
-
226
- # Imports
227
- import torch
228
- # ... [rest of the script]
229
-
230
- Dependencies
231
-
232
- • Python Version: [Version]
233
- • Packages:
234
- • [Package Name]: [Version]
235
- • List all critical packages and their versions.
236
-
237
- Training Data
238
-
239
- [Provide detailed information about the training data, including sources, preprocessing steps, and any relevant statistics.]
240
-
241
- Evaluation Results
242
-
243
- [Present evaluation metrics, benchmarks, and any comparisons with other models.]
244
-
245
- Limitations and Biases
246
-
247
- [List known limitations, potential biases, and ethical considerations.]
248
-
249
- Acknowledgements
250
-
251
- • Contributors: [Names of contributors]
252
- • Resources: [Any libraries, datasets, or tools that were instrumental]
253
-
254
- Citation
255
-
256
- [Provide citation information.]
257
-
258
- @misc{ModelName,
259
- author = {[Author Names]},
260
- title = {[Model Title]},
261
- year = {[Year]},
262
- howpublished = {\url{[Model URL]}},
263
- }
264
-
265
- Note: This template is designed to provide a comprehensive overview of a machine learning model, facilitating reproducibility and transparency. Feel free to add or remove sections based on the specific needs of your project.