Upload folder using huggingface_hub (#1)
Browse files- 91dc58b33fc9da251c024a73f68bae61cd71a8a3feac0d4fd19de4a1d6d01662 (a9f22e9668ccac2985e9d37d62da4f7e513174ce)
- 939055ccd460fcc113674ebabfe1c0189cd76552a395e0947df70fb3525f5691 (3a777ca872fd793bfdd12a7a3b4ec7556c50edaf)
- README.md +59 -0
- config.json +1 -0
- model/config.json +0 -0
- model/generation_config.json +9 -0
- model/model +0 -0
- model/model.safetensors +3 -0
- model/smasher_config.json +1 -0
- plots.png +0 -0
README.md
ADDED
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
library_name: pruna-engine
|
4 |
+
thumbnail: "https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg"
|
5 |
+
metrics:
|
6 |
+
- memory_disk
|
7 |
+
- memory_inference
|
8 |
+
- inference_latency
|
9 |
+
- inference_throughput
|
10 |
+
- inference_CO2_emissions
|
11 |
+
- inference_energy_consumption
|
12 |
+
---
|
13 |
+
<!-- header start -->
|
14 |
+
<!-- 200823 -->
|
15 |
+
<div style="width: auto; margin-left: auto; margin-right: auto">
|
16 |
+
<a href="https://www.pruna.ai/" target="_blank" rel="noopener noreferrer">
|
17 |
+
<img src="https://i.imgur.com/eDAlcgk.png" alt="PrunaAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
18 |
+
</a>
|
19 |
+
</div>
|
20 |
+
<!-- header end -->
|
21 |
+
|
22 |
+
# Simply make AI models cheaper, smaller, faster, and greener!
|
23 |
+
|
24 |
+
## Results
|
25 |
+
|
26 |
+
![image info](./plots.png)
|
27 |
+
|
28 |
+
## Setup
|
29 |
+
|
30 |
+
You can run the smashed model by:
|
31 |
+
1. Installing and importing the `pruna-engine` (version 0.2.6) package. Use `pip install pruna --extra-index-url https://pypi.nvidia.com --extra-index-url https://pypi.ngc.nvidia.com` for installation. See [Pypi](https://pypi.org/project/pruna-engine/) for detailed on the package.
|
32 |
+
2. Downloading the model files at `model_path`. This can be done using huggingface with this repository name or with manual downloading.
|
33 |
+
3. Loading the model
|
34 |
+
4. Running the model.
|
35 |
+
|
36 |
+
You can achieve this by running the following code:
|
37 |
+
|
38 |
+
```python
|
39 |
+
from transformers.utils.hub import cached_file
|
40 |
+
from pruna_engine.PrunaModel import PrunaModel # Step (1): install and import `pruna-engine` package.
|
41 |
+
|
42 |
+
...
|
43 |
+
model_path = cached_file("PrunaAI/REPO", "model") # Step (2): download the model files at `model_path`.
|
44 |
+
smashed_model = PrunaModel.load_model(model_path) # Step (3): load the model.
|
45 |
+
y = smashed_model(x) # Step (4): run the model.
|
46 |
+
```
|
47 |
+
|
48 |
+
## Configurations
|
49 |
+
|
50 |
+
The configuration info are in `config.json`.
|
51 |
+
|
52 |
+
## License
|
53 |
+
|
54 |
+
We follow the same license as the original model. Please check the license of the original model before using this model.
|
55 |
+
|
56 |
+
## Want to compress other models?
|
57 |
+
|
58 |
+
- Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
|
59 |
+
- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
|
config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"pruner": "None", "pruning_ratio": 0.0, "factorizer": "None", "quantizer": "huggingface-gptq", "n_quantization_bits": 4, "output_deviation": 0.005, "compiler": "None", "static_batch": true, "static_shape": true, "controlnet": "None", "unet_dim": 4, "device": "cuda", "cache_dir": "/ceph/hdd/staff/charpent/.cache/models", "max_batch_size": 1, "image_height": "None", "image_width": "None", "version": "None", "tokenizer_name": "placeholder", "qtype_weight": "torch.qint8", "qtype_activation": "torch.quint8", "qobserver": "<class 'torch.ao.quantization.observer.MinMaxObserver'>", "qscheme": "torch.per_tensor_symmetric", "qconfig": "x86", "group_size": 128, "damp_percent": 0.1}
|
model/config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model/generation_config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"pad_token_id": 32000,
|
6 |
+
"temperature": 0.9,
|
7 |
+
"top_p": 0.6,
|
8 |
+
"transformers_version": "4.35.2"
|
9 |
+
}
|
model/model
ADDED
Binary file (668 Bytes). View file
|
|
model/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:aeca36082347707c16346c033c2a5afe80feb1a2f68499e87d08b00e0a85fb58
|
3 |
+
size 3896714608
|
model/smasher_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"load_function": "hf-gptq", "api_key": "pruna_c4c77860c62a2965f6bc281841ee1d7bd3", "verify_url": "http://johnrachwan.pythonanywhere.com", "model_specific": {"model_specific": {}}}
|
plots.png
ADDED