ferran-espuna
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -63,7 +63,33 @@ This model card corresponds to the fp8-quantized version of Salamandra-2b-instru
|
|
63 |
The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).
|
64 |
|
65 |
|
66 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
### Author
|
69 |
International Business Machines (IBM).
|
|
|
63 |
The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).
|
64 |
|
65 |
|
66 |
+
## How to Use
|
67 |
+
|
68 |
+
The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on
|
69 |
+
any current version of the libraries. This example provides a chat interface for the model.
|
70 |
+
|
71 |
+
```
|
72 |
+
from vllm import LLM, SamplingParams
|
73 |
+
|
74 |
+
model_name = "BSC-LT/salamandra-2b-instruct-fp8"
|
75 |
+
llm = LLM(model=model_name)
|
76 |
+
|
77 |
+
messages = []
|
78 |
+
|
79 |
+
while True:
|
80 |
+
user_input = input("user >> ")
|
81 |
+
if user_input.lower() == "exit":
|
82 |
+
print("Chat ended.")
|
83 |
+
break
|
84 |
+
|
85 |
+
messages.append({'role': 'user', 'content': user_input})
|
86 |
+
|
87 |
+
outputs = llm.chat(messages, sampling_params=SamplingParams(temperature=0.5, stop_token_ids=[5], max_tokens=200))[0].outputs
|
88 |
+
model_output = outputs[0].text
|
89 |
+
print(f'assistant >> {model_output}')
|
90 |
+
messages.append({'role': 'assistant', 'content': model_output})
|
91 |
+
```
|
92 |
+
|
93 |
|
94 |
### Author
|
95 |
International Business Machines (IBM).
|