yujiepan commited on
Commit
812428e
1 Parent(s): 0041bbf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ inference: true
5
+ widget:
6
+ - text: Hello!
7
+ example_title: Hello world
8
+ group: Python
9
+ ---
10
+
11
+ This model is for debugging. It is randomly initialized using the config from [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) but with smaller size.
12
+
13
+ Codes:
14
+ ```python
15
+ from awq import AutoAWQForCausalLM
16
+ from transformers import AutoTokenizer
17
+
18
+ model_path = "yujiepan/meta-llama-3.1-tiny-random-hidden128"
19
+ quant_config = {
20
+ "zero_point": True,
21
+ "q_group_size": 64,
22
+ "w_bit": 4,
23
+ "version": "GEMM",
24
+ }
25
+ # Load model
26
+ model = AutoAWQForCausalLM.from_pretrained(
27
+ model_path, low_cpu_mem_usage=True, use_cache=False, device_map='cuda',
28
+ )
29
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
30
+
31
+ # Quantize
32
+ model.quantize(tokenizer, quant_config=quant_config)
33
+ ```