|
--- |
|
library_name: transformers |
|
tags: |
|
- hindi |
|
- bilingual |
|
license: llama2 |
|
language: |
|
- hi |
|
- en |
|
--- |
|
|
|
# LLama3-Gaja-Hindi-8B-v0.1 |
|
|
|
## Overview |
|
|
|
LLama3-Gaja-Hindi-8B-v0.1 is an extension of the Ambari series, a bilingual English/Hindi model developed and released by [Cognitivelab.in](https://www.cognitivelab.in/). This model is specialized for natural language understanding tasks, particularly in the context of instructional pairs. It is built upon the [Llama3 8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, utilizing a fine-tuning process with a curated dataset of translated instructional pairs. |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6442d975ad54813badc1ddf7/G0u9L6RQJFinST0chQmfL.jpeg" width="500px"> |
|
|
|
## Generate |
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda") |
|
tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True) |
|
|
|
# Existing messages list |
|
messages = [ |
|
{"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."}, |
|
{"role": "user", "content": "Who are you"} |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
# tokenize=False, |
|
return_tensors="pt" |
|
).to("cuda") |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=256, |
|
eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"), |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
) |
|
response = outputs[0][input_ids.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Multi-turn Chat |
|
|
|
To use the Ambari-7B-Instruct-v0.1 model, you can follow the example code below: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda") |
|
tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True) |
|
|
|
# Existing messages list |
|
messages = [ |
|
{"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."}, |
|
] |
|
|
|
# Function to add user input and generate response |
|
def process_user_input(user_input): |
|
global messages |
|
# Add user's input to messages list |
|
messages.append({"role": "user", "content": user_input}) |
|
|
|
# Prepare the prompt for generation |
|
prompt_formatted_message = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
tokenize=False |
|
) |
|
|
|
# Configure generation parameters |
|
generation_config = GenerationConfig( |
|
repetition_penalty=1.2, |
|
max_new_tokens=8000, |
|
temperature=0.2, |
|
top_p=0.95, |
|
top_k=40, |
|
bos_token_id=tokenizer.bos_token_id, |
|
eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"), |
|
pad_token_id=tokenizer.pad_token_id, |
|
do_sample=True, |
|
use_cache=True, |
|
return_dict_in_generate=True, |
|
output_attentions=False, |
|
output_hidden_states=False, |
|
output_scores=False, |
|
) |
|
|
|
streamer = TextStreamer(tokenizer) |
|
batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt") |
|
print("\033[32mResponse: \033[0m") # Print an empty response |
|
# Generate response |
|
generated = model.generate( |
|
inputs=batch["input_ids"].to("cuda"), |
|
generation_config=generation_config, |
|
streamer=streamer, |
|
|
|
) |
|
|
|
# Extract and format assistant's response |
|
# print(tokenizer.decode(generated["sequences"].cpu().tolist()[0])) |
|
assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0]) |
|
# Find the last occurrence of "assistant" and empty string ("") |
|
assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>") |
|
empty_string_index = assistant_response.rfind("<|eot_id|>") |
|
|
|
# Extract the text between the last "assistant" and "" |
|
if assistant_start_index != -1 and empty_string_index != -1: |
|
final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index] |
|
else: |
|
# final_response = assistant_response # If indices not found, use the whole response |
|
assert "Filed to generate multi turn prompt formate" |
|
|
|
# Append the extracted response to the messages list |
|
messages.append({"role": "assistant", "content": final_response}) |
|
# messages.append({"role": "assistant", "content": assistant_response}) |
|
|
|
# Print assistant's response |
|
# print(f"Assistant: {assistant_response}") |
|
|
|
# Main interaction loop |
|
while True: |
|
print("=================================================================================") |
|
user_input = input("Input: ") # Prompt user for input |
|
|
|
# Check if user_input is empty |
|
if not user_input.strip(): # .strip() removes any leading or trailing whitespace |
|
break # Break out of the loop if input is empty |
|
# Print response placeholder |
|
process_user_input(user_input) # Process user's input and generate response |
|
|
|
``` |
|
|
|
## Prompt formate |
|
|
|
system prompt = `You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.` |
|
``` |
|
<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
``` |
|
|
|
## Benchmarks |
|
coming soon |
|
|
|
## Bilingual Instruct Fine-tuning |
|
|
|
The model underwent a pivotal stage of supervised fine-tuning with low-rank adaptation, focusing on bilingual instruct fine-tuning. This approach involved training the model to respond adeptly in either English or Hindi based on the language specified in the user prompt or instruction. |
|
|
|
## References |
|
|
|
- [Ambari-7B-Instruct Model](https://huggingface.co/Cognitive-Lab/Ambari-7B-Instruct-v0.1) |