Sreenington commited on
Commit
7479fcd
1 Parent(s): 8693275

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - AWQ
7
+ - phi3
8
+ ---
9
+
10
+ # Phi 3 mini 4k instruct - AWQ
11
+ - Model creator: [Microsoft](https://huggingface.co/microsoft)
12
+ - Original model: [Phi 3 mini 4k Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
13
+
14
+ <!-- description start -->
15
+ ## Description
16
+
17
+ This repo contains AWQ model files for [Microsoft's Phi 3 mini 4k Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct).
18
+
19
+
20
+ ### About AWQ
21
+
22
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.
23
+
24
+ It is also now supported by continuous batching server [vLLM](https://github.com/vllm-project/vllm), allowing the use of AWQ models for high-throughput concurrent inference in multi-user server scenarios. Note that, at the time of writing, overall throughput is still lower than running vLLM with unquantised models, however using AWQ enables using much smaller GPUs which can lead to easier deployment and overall cost savings. For example, a 70B model can be run on 1 x 48GB GPU instead of 2 x 80GB.
25
+
26
+
27
+ ## Model Details
28
+
29
+ The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
30
+ The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) which is the context length (in tokens) that it can support.
31
+
32
+ The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures.
33
+ When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
34
+
35
+ Resources and Technical Documentation:
36
+
37
+ + [Phi-3 Microsoft Blog](https://aka.ms/phi3blog-april)
38
+ + [Phi-3 Technical Report](https://aka.ms/phi3-tech-report)
39
+ + [Phi-3 on Azure AI Studio](https://aka.ms/phi3-azure-ai)
40
+ + Phi-3 GGUF: [4K](https://aka.ms/Phi3-mini-4k-instruct-gguf)
41
+ + Phi-3 ONNX: [4K](https://aka.ms/Phi3-mini-4k-instruct-onnx)
42
+
43
+ ## Prompt Format
44
+ <pre>
45
+ <|user|>
46
+ How to explain the Internet for a medieval knight?<|end|>
47
+ <|assistant|>
48
+ </pre>
49
+
50
+
51
+ ## How to use
52
+
53
+ ### using vLLM
54
+ ```python
55
+ from vllm import LLM, SamplingParams
56
+
57
+ # Create a sampling params object.
58
+ sampling_params = SamplingParams(max_tokens=128)
59
+
60
+ # Create an LLM.
61
+ llm = LLM(model="Sreenington/Phi-3-mini-4k-instruct-AWQ", quantization="AWQ")
62
+
63
+ # Prompt template
64
+ prompt = """
65
+ <|user|>
66
+ How to explain the Internet for a medieval knight?<|end|>
67
+ <|assistant|>
68
+ """
69
+
70
+ outputs = llm.generate(prompt, sampling_params)
71
+
72
+ # Print the outputs.
73
+ for output in outputs:
74
+ prompt = output.prompt
75
+ generated_text = output.outputs[0].text
76
+ print(f"Prompt: {prompt!r}\n Generated text:\n {generated_text!r}")
77
+ ```