Safetensors
mpt
Krutrim
language-model
custom_code
File size: 13,382 Bytes
58c49dc
 
 
 
 
 
 
 
 
 
 
 
 
a7cb950
f1c4a7c
cca7ae7
 
f1c4a7c
58c49dc
 
 
 
 
4dfe394
58c49dc
 
 
 
 
 
 
 
a477770
58c49dc
 
 
 
 
 
 
 
 
 
 
a477770
 
58c49dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149fd3b
58c49dc
 
 
 
 
f7969cf
58c49dc
 
f7969cf
58c49dc
f7969cf
 
58c49dc
 
f7969cf
 
58c49dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70d97a8
58c49dc
 
 
 
70d97a8
 
 
 
 
 
 
 
58c49dc
70d97a8
 
58c49dc
 
 
 
 
70d97a8
58c49dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149fd3b
a7cb950
149fd3b
58c49dc
6fc434b
 
 
 
c8fe1be
6fc434b
c8fe1be
bfe7bfd
6fc434b
 
ef1e553
6fc434b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
language:
- en
- hi
- bn
- mr
- te
- ta
- kn
- ml
- gu
- as
- pa
license: other
license_name: krutrim-community-license-agreement-version-1.0
license_link: LICENSE.md


tags:
- Krutrim
- language-model
---
# Krutrim-1
[![Static Badge](https://img.shields.io/badge/Github-Krutrim_1-yellow?logo=huggingface)](https://huggingface.co/krutrim-ai-labs/Krutrim-1-instruct)[![Static Badge](https://img.shields.io/badge/Github-Krutrim_1-yellow?logo=github)](https://github.com/ola-krutrim/Krutrim-1-7B)[![Static Badge](https://img.shields.io/badge/Krutrim_Cloud-Krutrim_1-orange?logo=)](https://cloud.olakrutrim.com/console/inference-service?section=models&modelName=krutrim&artifactName=Krutrim-1&artifactType=model)[![Static Badge](https://img.shields.io/badge/Krutrim_AI_Labs-Krutrim_1-blue?logo=)](https://ai-labs.olakrutrim.com/models/Krutrim-LLM-1)

## Model Overview
Krutrim Large Language Model (LLM) is a 2 trillion token multilingual foundation model designed to serve Indian demographic needs through equitable representation of the country's array of native tongues. Training data incorporates the largest known Indic language dataset, mitigating associated data scarcity obstacles that encumber model parity across dialects. Evaluations demonstrate Krutrim's strong performance on Indic language benchmarks, surpassing or at par with state-of-the-art models despite being significantly smaller in training flops. Krutrim LLM also matches or exceeds standards set on English benchmarks by models trained on comparable flops (e.g. vs LLAMA-2 on 10 out of 16 tasks with average score of 0.57 vs 0.55 of LLAMA-2), evidencing flexible multilingual fluency. Through intentional design choices that redress endemic data imbalances, Krutrim LLM signifies meaningful progress in the pursuit of ethical, globally representative AI foundation models.

## Key Features
- 7B parameter dense transformer model comparable similarly sized LLama-2 model;
- Natively multilingual delivering best-in-class performance for a 7B mdoel on Indic benchmarks;
- Exceeds performance of similar sized models on multilingual Indic generation tasks including creative writing, summarization, and translation;
- Available in instruction-tuned version

## Model Developer
- OLA Krutrim Team

## Model Dates
- Krutrim-1 was trained between Oct 2023 and Nov 2023.

## Release History

| Model Name | Release Date |Release Note | Reference|
|------------|-------------|-------------|-------------|
| Krutrim-1-Base  | 2024-01-31 | Trained from scratch | |
| Krutrim-1-Instruct  | 2024-01-31 | SFT on Krutrim-1 Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-1-instruct)|


## Data Freshness
- The dataset includes information up to April 2023.

## Model Architecture
- Layers: 32
- Max Sequence Length: 4096
- Hidden Dimension: 4608
- Head Dimension: 96
- Number of Heads: 48
- Number of KV-Heads: 8 (GQA)
- Vocabulary Size: 70400
- Architecture Type: Transformer Decoder (Auto-regressive Language Model)

## Evaluation Results

### English Comparison between Llama2Chat-7B and Krutrim-1-7B

| Task               | Llama2Chat | Krutrim-1-7B |
|--------------------|--------------|------------|
| arc               | 0.517        | **0.557**  |
| bigbench          | **0.359**    | 0.330      |
| boolq            | 0.803   | **0.843**      |
| copa             | 0.78         | **0.82**   |
| hellaswag        | **0.754**    | 0.740      |
| jeopardy         | **0.306**        | 0.286  |
| lambadaopenai    | **0.695**    | 0.682      |
| logiqa           | **0.332**        | 0.3195 |
| mathqa           | 0.436    | **0.440**      |
| mmlu             | 0.472        | **0.495**  |
| openbookqa       | 0.44         | **0.464**  |
| piqa             | 0.7601   | **0.7726**     |
| simplearithmetic | **0.160**        | 0.077  |
| squad            | 0.3565       | **0.369**  |
| winograd         | **0.8645**   | 0.828      |
| winogrande       | 0.681        | **0.697**  |
| **average**      | **0.54**     | **0.54**   |


### Benchmarks

| Model            | bn   | gu   | hi   | kn   | ml   | mr   | ta   | te   |
|------------------|------|------|------|------|------|------|------|------|
| **IndicCOPA**    |      |      |      |      |      |      |      |      |
| Krutrim-1-7B        | 0.89 | 0.83 | 0.86 | 0.88 | 0.88 | 0.87 | 0.89 | 0.89 |
| GPT-3.5          | 0.77 | 0.73 | 0.77 | 0.74 | 0.75 | 0.70 | 0.72 | 0.75 |
| Airawata         | -    | -    | 0.74 | -    | -    | -    | -    | -    |
| Kan-LLaMA        | -    | -    | -    | 0.74 | -    | -    | -    | -    |
| Tam-LLaMA        | -    | -    | -    | -    | -    | -    | 0.77 | -    |
| **IndicQA**      |      |      |      |      |      |      |      |      |
| Krutrim-1-7B        | 0.65 | 0.64 | 0.64 | 0.60 | 0.66 | 0.58 | 0.75 | 0.83 |
| Airawata         | -    | -    | 0.62 | -    | -    | -    | -    | -    |
| Kan-LLaMA        | -    | -    | -    | 0.52 | -    | -    | -    | -    |
| Tam-LLaMA        | -    | -    | -    | -    | -    | -    | 0.35 | -    |
| **IndicSentiment**|      |      |      |      |      |      |      |      |
| Krutrim-1-7B        | 0.95 | 0.96 | 0.96 | 0.95 | 0.96 | 0.97 | 0.94 | 0.95 |
| GPT-3.5          | 0.50 | 0.81 | 0.96 | 0.60 | 0.75 | 0.88 | 0.51 | 0.53 |
| Airawata         | -    | -    | 0.84 | -    | -    | -    | -    | -    |
| Kan-LLaMA        | -    | -    | -    | 0.85 | -    | -    | -    | -    |
| Tam-LLaMA        | -    | -    | -    | -    | -    | -    | 0.78 | -    |
| **IndicTranslation**|   |      |      |      |      |      |      |      |
| Krutrim-1-7B        | 0.88 | 0.89 | 0.95 | 0.88 | 0.89 | 0.92 | -    | 0.88 |
| Airawata         | -    | -    | 0.94 | -    | -    | -    | -    | -    |
| Kan-LLaMA        | -    | -    | -    | 0.59 | -    | -    | -    | -    |
| **IndicXParaphrase**|  |      |      |      |      |      |      |      |
| Krutrim-1-7B        | 0.91 | -    | 0.97 | 0.82 | 0.90 | 0.94 | -    | 0.61 |
| Airawata         | -    | -    | 0.60 | -    | -    | -    | -    | -    |
| Kan-LLaMA        | -    | -    | -    | 0.59 | -    | -    | -    | -    |

## Usage

To use the instruct model, you can load it with `AutoModelForCausalLM` as follows:
```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "krutrim-ai-labs/Krutrim-1-instruct"
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Apply Chat Template
chat_template ="{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|SYSTEM|> ' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|USER|> ' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|RESPONSE|>\n'  + message['content'] + eos_token + '\n' }}{% else %}{{ '<|RESPONSE|>\n'  + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|RESPONSE|>\n' }}{% endif %}{% endfor %}"
tokenizer.chat_template = chat_template

prompt_dict = [
    {"role": "system", "content": "You are an AI assistant."},
    {"role": "user", "content": "Who are you?"}
]

prompts = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(prompts, return_tensors='pt').to(device)
inputs.pop("token_type_ids", None)

# Generate response
outputs = model.generate(
    **inputs,
    max_length=100
)

response = tokenizer.decode(outputs[0])
print(response)
```

## Limitations
The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
- Amplify biases present in the training data
- Generate toxic responses, especially when prompted with toxic inputs
- Provide inaccurate, incomplete, or redundant answers
- Generate responses in languages inconsistent with the prompt

## Ethical Considerations
- The model may produce biased or offensive outputs based on its training data.
- Users should apply human oversight when using the model for decision-making in sensitive areas.
- While safeguards have been implemented, the model may still generate socially undesirable text in certain contexts.

## License
This code repository and the model weights are licensed under the [Krutrim Community License.](LICENSE.md)

## Contact
Contributions are welcome! If you have any improvements or suggestions, feel free to submit a pull request on GitHub.

## Citation
```
@misc{KrutrimLLM2024,
  author = {Aditya Kallappa, Palash Kamble, Vinayak Dhruv, Akshat Patidar, Arveti Manjunath, Raghav Awasthi, Divyansh Rajput, Pidathala Sowjanya, Rahul Kumar, Rishabh Nahata, Sanket Shah, Sulabh Katiyar, Soham Pendurkar, Sindhu Pawar, Abhinav Ravi, Deepak Kumar, Kumar Ashish, Pranav Raveendran, Bidyapathi Ray},
  title = {Krutrim LLM},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ola-krutrim/Krutrim-1-7B}}
}
```