ssuncheol commited on
Commit
010bd27
1 Parent(s): f0b4587

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -1,3 +1,117 @@
1
  ---
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - text-generation-inference
6
+ language:
7
+ - en
8
  ---
9
+
10
+ # phi-3-mini-128k-instruct-int4
11
+
12
+ - Orginal model : [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
13
+ - Quantized using [intel/auto-round](https://github.com/intel/auto-round)
14
+
15
+ ## Description
16
+
17
+ **Phi-3-mini-128k-instruct-int4** is an int4 model with group_size 128 of the [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct).
18
+
19
+ The above model was quantized using AutoRound(Advanced Weight-Only Quantization Algorithm for LLMs) released by [intel](https://github.com/intel).
20
+
21
+ you can find out more in detail through the the [GitHub Repository](https://github.com/intel/auto-round).
22
+
23
+
24
+ ## Training details
25
+
26
+
27
+ ### Cloning a repository(AutoRound)
28
+
29
+ ```
30
+ git clone https://github.com/intel/auto-round
31
+ ```
32
+
33
+ ### Enter into the examples/language-modeling folder
34
+
35
+ ```
36
+ cd auto-round/examples/language-modeling
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ ### Install FlashAttention-2
41
+
42
+ ```
43
+
44
+ pip install flash_attn==2.5.8
45
+
46
+ ```
47
+
48
+
49
+ Here's an simplified code for quantization.
50
+
51
+ ```
52
+ python main.py \
53
+ --model_name "microsoft/Phi-3-mini-128k-instruct" \
54
+ --bits 8 \
55
+ --group_size 128 \
56
+ --train_bs 1 \
57
+ --gradient_accumulate_steps 8 \
58
+ --deployment_device 'gpu' \
59
+ --output_dir "./save_ckpt"
60
+ ```
61
+
62
+
63
+ ## Model inference
64
+
65
+
66
+ ### Install the necessary packages
67
+
68
+ ```
69
+ pip install auto_gptq
70
+ pip install optimum
71
+ pip install -U accelerate bitsandbytes datasets peft transformers
72
+ ```
73
+
74
+ ### Example codes
75
+
76
+ ```
77
+ import torch
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
79
+
80
+ torch.random.manual_seed(0)
81
+
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ "ssuncheol/phi-3-mini-128k-instruct-int4",
84
+ device_map="cuda",
85
+ torch_dtype="auto",
86
+ trust_remote_code=True,
87
+ )
88
+ tokenizer = AutoTokenizer.from_pretrained("ssuncheol/phi-3-mini-128k-instruct-int4")
89
+
90
+ messages = [
91
+ {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
92
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
93
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
94
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
95
+ ]
96
+
97
+ pipe = pipeline(
98
+ "text-generation",
99
+ model=model,
100
+ tokenizer=tokenizer,
101
+ )
102
+
103
+ generation_args = {
104
+ "max_new_tokens": 500,
105
+ "return_full_text": False,
106
+ "temperature": 0.0,
107
+ "do_sample": False,
108
+ }
109
+
110
+ output = pipe(messages, **generation_args)
111
+ print(output[0]['generated_text'])
112
+ ```
113
+
114
+
115
+ ## License
116
+ The model is licensed under the MIT license.
117
+