mohit-sarvam commited on
Commit
b9fae8f
·
verified ·
1 Parent(s): 1948f63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -1
README.md CHANGED
@@ -58,4 +58,51 @@ else:
58
 
59
  print("thinking content:", thinking_content)
60
  print("content:", content)
61
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  print("thinking content:", thinking_content)
60
  print("content:", content)
61
+ ```
62
+
63
+ ## VLLM Deployment
64
+
65
+ For deployment, you can use `vllm>=0.8.5` to create an OpenAI-compatible API endpoint:
66
+ ```shell
67
+ vllm serve sarvamai/sarvam-M
68
+ ```
69
+
70
+ For inference and switching between thinking and non-thinking mode, refer to the below python code:
71
+ ```python
72
+ from openai import OpenAI
73
+
74
+ # Modify OpenAI's API key and API base to use vLLM's API server.
75
+ openai_api_key = "EMPTY"
76
+ openai_api_base = "http://localhost:8000/v1"
77
+
78
+ client = OpenAI(
79
+ api_key=openai_api_key,
80
+ base_url=openai_api_base,
81
+ )
82
+
83
+ models = client.models.list()
84
+ model = models.data[0].id
85
+
86
+ messages = [{"role": "user", "content": "How many letter r in word strawberry?"}]
87
+
88
+ # By default, the model is in thinking mode.
89
+ # If you want to disable thinking, add:
90
+ # extra_body={"chat_template_kwargs": {"enable_thinking": False}}
91
+ response = client.chat.completions.create(model=model, messages=messages)
92
+ output_text = response.choices[0].message.content
93
+
94
+ if "</think>" in output_text:
95
+ thinking_content = output_text.split("</think>")[0].rstrip("\n")
96
+ content = output_text.split("</think>")[-1].lstrip("\n").rstrip("</s>")
97
+ else:
98
+ thinking_content = ""
99
+ content = output_text.rstrip("</s>")
100
+
101
+ print("reasoning_content:", thinking_content)
102
+ print("content:", content)
103
+
104
+ # For the next round, add the assistant's response and reasoning to the messages.
105
+ messages.append(
106
+ {"role": "assistant", "content": content, "reasoning_content": reasoning_content}
107
+ )
108
+ ```