mgoin commited on
Commit
3df03a8
1 Parent(s): b173607

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -69
README.md CHANGED
@@ -14,75 +14,6 @@ license_name: llama3
14
  license_link: LICENSE
15
  ---
16
 
17
- Quantized with auto-gptq:
18
- ```python
19
- import argparse, gc, shutil
20
- from transformers import AutoTokenizer
21
- from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
22
- from datasets import load_dataset
23
-
24
- parser = argparse.ArgumentParser()
25
- parser.add_argument("--model-id", type=str)
26
- parser.add_argument("--save-dir", type=str)
27
- parser.add_argument("--channelwise", action="store_true")
28
- parser.add_argument("--num-samples", type=int, default=512)
29
- parser.add_argument("--max-seq-len", type=int, default=2048)
30
-
31
-
32
- def preprocess(example):
33
- return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
34
-
35
- if __name__ == "__main__":
36
- args = parser.parse_args()
37
-
38
- dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft[:5%]")
39
- tokenizer = AutoTokenizer.from_pretrained(args.model_id)
40
- ds = dataset.shuffle().select(range(args.num_samples))
41
- ds = ds.map(preprocess)
42
-
43
- examples = [
44
- tokenizer(
45
- example["text"], padding=False, max_length=args.max_seq_len, truncation=True,
46
- ) for example in ds
47
- ]
48
-
49
- if args.channelwise:
50
- group_size = -1
51
- else:
52
- group_size = 128
53
-
54
- quantize_config = BaseQuantizeConfig(
55
- bits=4, # Only support 4 bit
56
- group_size=group_size, # Set to g=128 or -1 (for channelwise)
57
- desc_act=False, # Marlin does not suport act_order=True
58
- )
59
-
60
- model = AutoGPTQForCausalLM.from_pretrained(
61
- args.model_id,
62
- quantize_config,
63
- device_map="auto")
64
- model.quantize(examples)
65
-
66
- gptq_save_dir = "./tmp-gptq"
67
- print(f"Saving gptq model to {gptq_save_dir}")
68
- model.save_pretrained(gptq_save_dir)
69
- tokenizer.save_pretrained(gptq_save_dir)
70
-
71
- del model
72
- gc.collect()
73
-
74
- print("Reloading in marlin format")
75
-
76
- marlin_model = AutoGPTQForCausalLM.from_quantized(
77
- gptq_save_dir,
78
- use_marlin=True,
79
- device_map="auto")
80
-
81
- print("Saving in marlin format")
82
- marlin_model.save_pretrained(args.save_dir)
83
- tokenizer.save_pretrained(args.save_dir)
84
- ```
85
-
86
  ## Model Details
87
 
88
  Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
 
14
  license_link: LICENSE
15
  ---
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Model Details
18
 
19
  Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.