sarvamai
/

sarvam-m

@@ -14,29 +14,39 @@ language:
 - ta
 - te
 base_model:
-- mistralai/Mistral-Small-3.1-24B-Instruct-2503
 ---
-## Model Information
-Sarvam-M multilingual hybrid reasoning llm is an instruction tuned generative model in 24B (text in/text out) post trained over Mistral 3.1 24B. It significantly improves on the base Mistral model: +20% average improvement on Indian language benchmarks, +21.6% on math benchmarks, and +17.6% on programming benchmarks. The gains in tasks in the intersectionality of Indian languages and math are even higher, e.g., +86% improvement in a romanized Indian language GSM-8K benchmark.
-Learn in detail about sarvam-M in our [blog post](link)
-## Key Features
-- **Hybrid thinking mode** A single model supports both "think" and "non-think" modes. Use the think mode for tasks requiring complex logical reasoning, math, and coding, and switch to the non-think mode for efficient, general-purpose conversation.
-- **Indic Skills** Specifically post-trained on Indian languages alongside English, the model also embodies a character that reflects and emphasizes Indian cultural values.
-- **Reasoning capabilities** Sarvam-M outperforms most models of similar size on coding and math benchmarks, demonstrating strong reasoning capabilities.
-- **Chatting Experience** With support for both Indic scripts and romanized versions of Indian languages, Sarvam-M offers a smooth and accessible multilingual chat experience.
-## Quickstart
-The following contains a code snippet illustrating how to use the model generate content based on given inputs.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "sarvamai/sarvam-M"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -72,14 +82,12 @@ print("reasoning content:", reasoning_content)
 print("content:", content)
 ```
-## VLLM Deployment
-For deployment, you can use `vllm>=0.8.5` to create an OpenAI-compatible API endpoint:
-```shell
-vllm serve sarvamai/sarvam-M
-```
-For inference and switching between thinking and non-thinking mode, refer to the below python code:
 ```python
 from openai import OpenAI
@@ -117,6 +125,4 @@ print("content:", content)
 messages.append(
     {"role": "assistant", "content": output_text}
 )
-```
-The above example also shows how to add assistant turns in the messages for multiturn conversation.

 - ta
 - te
 base_model:
+- mistralai/Mistral-Small-3.1-24B-Base-2503
 ---
+# Model Information
+`sarvam-m` is a multilingual, hybrid-reasoning, text-only language model built on Mistral-Small. This post-trained version delivers exceptional improvements over the base model:
+- +20% average improvement on Indian language benchmarks
+- +21.6% enhancement on math benchmarks
+- +17.6% boost on programming benchmarks
+Performance gains are even more impressive at the intersection of Indian languages and mathematics, with an outstanding +86% improvement in romanized Indian language GSM-8K benchmarks.
+Learn more about sarvam-M in our detailed [blog post](https://www.sarvam.ai/blogs/sarvam-m).
+# Key Features
+- **Hybrid Thinking Mode**: A single versatile model supporting both "think" and "non-think" modes. Use the think mode for complex logical reasoning, mathematical problems, and coding tasks, or switch to non-think mode for efficient, general-purpose conversation.
+- **Advanced Indic Skills**: Specifically post-trained on Indian languages alongside English, embodying a character that authentically reflects and emphasizes Indian cultural values.
+- **Superior Reasoning Capabilities**: Outperforms most similarly-sized models on coding and math benchmarks, demonstrating exceptional reasoning abilities.
+- **Seamless Chatting Experience**: Full support for both Indic scripts and romanized versions of Indian languages, providing a smooth and accessible multilingual conversation experience.
+# Quickstart
+The following code snippet demonstrates how to use `sarvam-m` using Transformers.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "sarvamai/sarvam-m"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 print("content:", content)
 ```
+# VLLM Deployment
+For easy deployment, we can use `vllm>=0.8.5` and create an OpenAI-compatible API endpoint with `vllm serve sarvamai/sarvam-m`
+For more control, we can use vllm in Python. That way, we can explicitly enable or disable thinking mode.
 ```python
 from openai import OpenAI
 messages.append(
     {"role": "assistant", "content": output_text}
 )
+```