sharpenb commited on
Commit
d1d714d
·
1 Parent(s): 3340537

68a71a2c3263e13f3e11bc8b8389ddccc62045db0960bab75110884231ea7b2f

Browse files
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pruna-engine
4
+ thumbnail: "https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg"
5
+ metrics:
6
+ - memory_disk
7
+ - memory_inference
8
+ - inference_latency
9
+ - inference_throughput
10
+ - inference_CO2_emissions
11
+ - inference_energy_consumption
12
+ ---
13
+ <!-- header start -->
14
+ <!-- 200823 -->
15
+ <div style="width: auto; margin-left: auto; margin-right: auto">
16
+ <a href="https://www.pruna.ai/" target="_blank" rel="noopener noreferrer">
17
+ <img src="https://i.imgur.com/eDAlcgk.png" alt="PrunaAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
18
+ </a>
19
+ </div>
20
+ <!-- header end -->
21
+
22
+ # Simply make AI models cheaper, smaller, faster, and greener!
23
+
24
+ ## Results
25
+
26
+ ![image info](./plots.png)
27
+
28
+ ## Setup
29
+
30
+ You can run the smashed model by:
31
+ 1. Installing and importing the `pruna-engine` (version 0.2.6) package. Use `pip install pruna --extra-index-url https://pypi.nvidia.com --extra-index-url https://pypi.ngc.nvidia.com` for installation. See [Pypi](https://pypi.org/project/pruna-engine/) for detailed on the package.
32
+ 2. Downloading the model files at `model_path`. This can be done using huggingface with this repository name or with manual downloading.
33
+ 3. Loading the model
34
+ 4. Running the model.
35
+
36
+ You can achieve this by running the following code:
37
+
38
+ ```python
39
+ from transformers.utils.hub import cached_file
40
+ from pruna_engine.PrunaModel import PrunaModel # Step (1): install and import `pruna-engine` package.
41
+
42
+ ...
43
+ model_path = cached_file("PrunaAI/REPO", "model") # Step (2): download the model files at `model_path`.
44
+ smashed_model = PrunaModel.load_model(model_path) # Step (3): load the model.
45
+ y = smashed_model(x) # Step (4): run the model.
46
+ ```
47
+
48
+ ## Configurations
49
+
50
+ The configuration info are in `config.json`.
51
+
52
+ ## License
53
+
54
+ We follow the same license as the original model. Please check the license of the original model before using this model.
55
+
56
+ ## Want to compress other models?
57
+
58
+ - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
59
+ - Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"pruner": "None", "pruning_ratio": 0.0, "factorizer": "None", "quantizer": "huggingface-gptq", "n_quantization_bits": 2, "output_deviation": 0.005, "compiler": "None", "static_batch": true, "static_shape": true, "controlnet": "None", "unet_dim": 4, "device": "cuda", "cache_dir": "/ceph/hdd/staff/charpent/.cache/models", "max_batch_size": 1, "image_height": "None", "image_width": "None", "version": "None", "tokenizer_name": "placeholder", "qtype_weight": "torch.qint8", "qtype_activation": "torch.quint8", "qobserver": "<class 'torch.ao.quantization.observer.MinMaxObserver'>", "qscheme": "torch.per_tensor_symmetric", "qconfig": "x86", "group_size": 128, "damp_percent": 0.1}
model/config.json ADDED
The diff for this file is too large to render. See raw diff
 
model/generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.35.2"
7
+ }
model/model ADDED
Binary file (668 Bytes). View file
 
model/smasher_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"load_function": "hf-gptq", "api_key": "pruna_c4c77860c62a2965f6bc281841ee1d7bd3", "verify_url": "http://johnrachwan.pythonanywhere.com", "model_specific": {"model_specific": {}}}
plots.png ADDED