---
language:
- en
- hi
- bn
- mr
- te
- ta
- kn
- ml
- gu
- as
- pa
license: unknown
tags:
- Krutrim
- language-model
---
# Krutrim-2

## Model Overview
Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is based on the Mistral-NeMo 12B architecture and has undergone continual pretraining with 500B tokens across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on 1.5M data points covering a diverse range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following, creative writing, and role-playing.

After fine-tuning, the model underwent Direct Preference Optimization (DPO) with 300K data points to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.

## Key Features
- Supports long context upto 128k tokens
- Available in both pre-trained and instruction-tuned versions
- Supports English and 22 scheduled Indian languages
- Demonstrates robust knowledge of Indic culture and context, responding with an Indian-centric perspective unless specified otherwise

## Model Developer
- OLA Krutrim Team

## Model Dates
- Krutrim-2 was trained between Dec 2024 and Jan 2025.

## Release History

| Model Name | Release Date |Release Note | Reference|
|------------|-------------|-------------|-------------|
| Krutrim-2-Base-0131   | 2024-01-31  | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base-0131)|
| Krutrim-2-Instruct-0131  | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct-0131)|


## Data Freshness
- The dataset includes information up to April 2024.

## Model Architecture
- Layers: 40
- Hidden Dimension: 5,120
- Head Dimension: 128
- Hidden Dimension: 14,336
- Activation Function: SiLU
- Number of Heads: 32
- Number of KV-Heads: 8 (GQA)
- Rotary Embeddings: Theta = 1M
- Vocabulary Size: 131072 (2^17)
- Architecture Type: Transformer Decoder (Auto-regressive Language Model)

## Evaluation Results

### English/Code/Math Benchmarks

| Dataset                     | Mistral-NeMo-12B-Base | Krutrim-1 | Mistral-NeMo-12B-Instruct |Krutrim-2-Instruct-0131 |
|-----------------------------|-----------------------|-----------|---------------------------|-----------|
| HellaSwag                   | 83%                   | 73%       | 82%                       | 83%       |
| Winogrande                  | 73%                   | 67%       | 74%                       | 77%       |
| CommonSenseQA               | 62%                   | 39%       | 70%                       | 74%       |
| MMLU                        | 69%                   | 44%       | 68%                       | 63%       |
| OpenBookQA                  | 48%                   | 44%       | 46%                       | 49%       |
| TriviaQA                    | 75%                   | 52%       | 72%                       | 62%       |
| NaturalQuestions            | 32%                   | 19%       | 28%                       | 26%       |
| TruthfulQA                  | 48%                   | 38%       | 54%                       | 59%       |
| GSM8K                       | 17%                   | 09%       | 74%                       | 71%       |
| ARC_Challenge               | 58%                   | 42%       | 59%                       | 60%       |
| ARC_Easy                    | 82%                   | 70%       | 80%                       | 82%       |
| HumanEval (pass@10)         | 32%                   | 00%       | 23%                       | 80%       |

### Indic Benchmarks

| Dataset                                 | Mistral-Nemo-Instruct-2407 | Krutrim-1 | Krutrim-2-Instruct-0131 |
|-----------------------------------------|----------------------------|--------------------|-------------|
| IndicSentiment (0-shot)                 | 70%                        | 65%                | 95%         |
| IndicCOPA (0-shot)                      | 58%                        | 51%                | 80%         |
| IndicXParaphrase (0-shot)               | 74%                        | 67%                | 88%         |
| IndicXNLI (3-shot)                      | 52%                        | 17%                | 58%         |
| CrossSumIN (1-shot) (chrf++)            | 17%                        | 4%                 | 21%         |
| FloresIN (1-shot, xx-en) (chrf++)       | 50%                        | 54%                | 58%         |
| FloresIN (1-shot, en-xx) (chrf++)       | 34%                        | 41%                | 46%         |

## Usage
To use the model, you can load it with `AutoModelForCausalLM` as follows:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "path/to/Krutrim-2_model"

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Add custom chat template
tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""

print(tokenizer.get_chat_template())

prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(prompt, return_tensors='pt')
inputs.pop("token_type_ids", None)

# Generate response
outputs = model.generate(
    **inputs,
    max_length=4096,
    temperature=0.5,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.2,
    num_return_sequences=1,
    do_sample=True,
    eos_token_id=2,
)

response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
```
Note: The provided chat template helps generate the best response by structuring conversations optimally for the model.

## Limitations
The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
- Amplify biases present in the training data
- Generate toxic responses, especially when prompted with toxic inputs
- Provide inaccurate, incomplete, or redundant answers
- Generate responses in languages inconsistent with the prompt

## Ethical Considerations
- The model may produce biased or offensive outputs based on its training data.
- Users should apply human oversight when using the model for decision-making in sensitive areas.
- While safeguards have been implemented, the model may still generate socially undesirable text in certain contexts.