geopar commited on
Commit
157d912
1 Parent(s): 0f5f40e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -30,8 +30,15 @@ The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFace
30
  ```
31
 
32
 
33
- The quantized model can be utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:
34
 
 
 
 
 
 
 
 
35
 
36
 
37
  ```python
@@ -78,6 +85,41 @@ outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)
78
  print(tokenizer.batch_decode(outputs)[0])
79
  ```
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  # Ethical Considerations
82
 
83
  This model has not been aligned with human preferences, and therefore might generate misleading, harmful, and toxic content.
 
30
  ```
31
 
32
 
33
+ # Using the model with Huggingface
34
 
35
+ First you need to install the dependencies
36
+
37
+ ```
38
+ pip install autoawq transformers
39
+ ```
40
+
41
+ The quantized model can be utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:
42
 
43
 
44
  ```python
 
85
  print(tokenizer.batch_decode(outputs)[0])
86
  ```
87
 
88
+ # Using the model with vLLM
89
+
90
+ Install vLLM
91
+
92
+ ```
93
+ pip install vllm
94
+ ```
95
+
96
+ Then use from python API:
97
+
98
+ ```python
99
+ from vllm import LLM, SamplingParams
100
+ from transformers import AutoTokenizer
101
+
102
+
103
+ tokenizer = AutoTokenizer.from_pretrained(
104
+ "ilsp/Meltemi-7B-Instruct-v1-AWQ",
105
+ trust_remote_code=False
106
+ )
107
+
108
+ prompts = ["Πες μου αν έχεις συνείδηση."]
109
+ prompts = [tokenizer.apply_chat_template(p, add_generation_prompt=True, tokenize=False) for p in prompts]
110
+
111
+ sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
112
+ llm = LLM(model="ilsp/Meltemi-7B-Instruct-v1-AWQ", tokenizer="ilsp/Meltemi-7B-Instruct-v1-AWQ", quantization="awq")
113
+
114
+ outputs = llm.generate(prompts, sampling_params)
115
+
116
+ for output in outputs:
117
+ prompt = output.prompt
118
+ generated_text = output.outputs[0].text
119
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
120
+ ```
121
+
122
+
123
  # Ethical Considerations
124
 
125
  This model has not been aligned with human preferences, and therefore might generate misleading, harmful, and toxic content.