aashish1904 commited on
Commit
157c4ca
1 Parent(s): 97759bb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +187 -0
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ tags:
5
+ - BharatGPT
6
+ - CoRover
7
+ language:
8
+ - hi
9
+ - pa
10
+ - gu
11
+ - kn
12
+ - mr
13
+ - te
14
+ - ml
15
+ - or
16
+ - ta
17
+ - ur
18
+ - bn
19
+ - en
20
+ license: other
21
+
22
+ ---
23
+
24
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
25
+
26
+
27
+ # QuantFactory/BharatGPT-3B-Indic-GGUF
28
+ This is quantized version of [CoRover/BharatGPT-3B-Indic](https://huggingface.co/CoRover/BharatGPT-3B-Indic) created using llama.cpp
29
+
30
+ # Original Model Card
31
+
32
+
33
+
34
+ ### Model Description
35
+
36
+ This model is fine-tuned and designed to generate multilingual outputs across multiple Indic languages. The model has been trained on a diverse and curated dataset comprising Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, and Telugu. It is optimized to handle natural language tasks such as translation, summarization, and conversational generation across these languages. This model is trained on authentic Indian conversational data in 12 languages. However, it is not designed for direct use as a standalone chatbot, as it lacks the latest data updates. It is best suited for S-RAG (Secure Retrieval-Augmented Generation) or fine-tuning with your own data. For enhanced performance, integration with **[Conversational Gen AI platform](https://builder.corover.ai)** is recommended (though not mandatory). This platform enables the creation of multi-modal and multi-lingual AI Agents, Co-Pilots, and Virtual Assistants (such as ChatBots, VoiceBots, and VideoBots) using a sovereign AI and composite AI approach. It leverages classic NLP, grounded generative AI, and Generally Available LLMs to deliver powerful, versatile solutions.
37
+
38
+ - **Developed by:** CoRover.ai
39
+ - **Model type:** Finetuned (Language Model for Multilingual Text Generation and Text Completion)
40
+ - **Language(s) (NLP):** Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, Telugu
41
+ - **Learn (Become C-CAP: CoRover Certified AI Professional):** [Get Certified in 1 Hour](https://www.udemy.com/course/corover-certified-ai-associate/?referralCode=0EFDC465CE65DF66C021)
42
+
43
+ ## How to Get Started with the Model
44
+
45
+ Make sure to update your transformers and bitsandbytes installation via `pip install -U transformers` & `pip install -U bitsandbytes`
46
+
47
+ Use the code below to get started with the model.
48
+
49
+ ## English
50
+ ```python
51
+ import torch
52
+ from transformers import pipeline
53
+
54
+ model_id = "CoRover/BharatGPT-3B-Indic"
55
+ pipe = pipeline(
56
+ "text-generation",
57
+ model=model_id,
58
+ torch_dtype=torch.bfloat16,
59
+ device_map="auto",
60
+ )
61
+ messages = [
62
+ {"role": "system", "content": "You are a helpful assistant who responds in English"},
63
+ {"role": "user", "content": "who created you?"},
64
+ ]
65
+ outputs = pipe(
66
+ messages,
67
+ max_new_tokens=256,
68
+ )
69
+ print(outputs[0]["generated_text"][-1])
70
+
71
+ ```
72
+
73
+ ## Hindi
74
+ ```python
75
+ import torch
76
+ from transformers import pipeline
77
+
78
+ model_id = "CoRover/BharatGPT-3B-Indic"
79
+ pipe = pipeline(
80
+ "text-generation",
81
+ model=model_id,
82
+ torch_dtype=torch.bfloat16,
83
+ device_map="auto",
84
+ )
85
+ messages = [
86
+ {"role": "system", "content": "You are a helpful assistant who responds in Hindi"},
87
+ {"role": "user", "content": "भारत की राजधानी क्या है"},
88
+ ]
89
+ outputs = pipe(
90
+ messages,
91
+ max_new_tokens=256,
92
+ )
93
+ print(outputs[0]["generated_text"][-1])
94
+
95
+ ```
96
+
97
+ ## Gujarati
98
+ ```python
99
+ import torch
100
+ from transformers import pipeline
101
+
102
+ model_id = "CoRover/BharatGPT-3B-Indic"
103
+ pipe = pipeline(
104
+ "text-generation",
105
+ model=model_id,
106
+ torch_dtype=torch.bfloat16,
107
+ device_map="auto",
108
+ )
109
+ messages = [
110
+ {"role": "system", "content": "You are a helpful assistant who responds in Gujarati"},
111
+ {"role": "user", "content": "શું છે ભારતની રાજધાની"},
112
+ ]
113
+ outputs = pipe(
114
+ messages,
115
+ max_new_tokens=256,
116
+ )
117
+ print(outputs[0]["generated_text"][-1])
118
+
119
+ ```
120
+
121
+ ## Marathi
122
+ ```python
123
+ import torch
124
+ from transformers import pipeline
125
+
126
+ model_id = "CoRover/BharatGPT-3B-Indic"
127
+ pipe = pipeline(
128
+ "text-generation",
129
+ model=model_id,
130
+ torch_dtype=torch.bfloat16,
131
+ device_map="auto",
132
+ )
133
+ messages = [
134
+ {"role": "system", "content": "You are a helpful assistant who responds in Marathi"},
135
+ {"role": "user", "content": "भारताची राजधानी कोणती?"},
136
+ ]
137
+ outputs = pipe(
138
+ messages,
139
+ max_new_tokens=256,
140
+ )
141
+ print(outputs[0]["generated_text"][-1])
142
+
143
+ ```
144
+
145
+ ## Malayalam
146
+ ```python
147
+ import torch
148
+ from transformers import pipeline
149
+
150
+ model_id = "CoRover/BharatGPT-3B-Indic"
151
+ pipe = pipeline(
152
+ "text-generation",
153
+ model=model_id,
154
+ torch_dtype=torch.bfloat16,
155
+ device_map="auto",
156
+ )
157
+ messages = [
158
+ {"role": "system", "content": "You are a helpful assistant who responds in Malayalam"},
159
+ {"role": "user", "content": "ഭരത് കി രാജധാനി ഉ���്ട്"},
160
+ ]
161
+ outputs = pipe(
162
+ messages,
163
+ max_new_tokens=256,
164
+ )
165
+ print(outputs[0]["generated_text"][-1])
166
+
167
+ ```
168
+
169
+ ## Training Details
170
+
171
+ ### Training Data
172
+
173
+ - **Language Spectrum**: A balanced representation of Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, and Telugu, capturing the unique syntactic structures of each language.
174
+
175
+ ## Usage and Limitations
176
+
177
+ - **License:** Non-Commercial. For academic and research purposes only. For commercial use, please visit [Conversational Gen AI platform](https://builder.corover.ai) or [Contact Us](https://corover.ai/contact/).
178
+
179
+ - **Terms of Use:** [Terms and Conditions](https://corover.ai/terms-conditions/)
180
+
181
+ - **Responsible AI Framework**: [CoRover's Responsible AI Framework](https://corover.ai/responsible-generative-ai-key-factors-for-ai-safety-and-trust/)
182
+
183
+ ## Hardware & Software
184
+
185
+ To ensure top-tier performance and scalability, the model was fine-tuned using state-of-the-art hardware and software configurations:
186
+
187
+ - NVIDIA A100 GPUs, renowned for their unmatched computational power and efficiency in deep learning tasks, were leveraged. These GPUs, with their advanced tensor cores, provided the ability to train large-scale models with reduced training time and enhanced precision. High-bandwidth GPU interconnects ensured seamless parallel processing for handling massive multilingual datasets.