Update README.md

aeb5850 verified 7 months ago

6.77 kB

	---
	library_name: transformers
	tags:
	- hindi
	- bilingual
	license: llama2
	language:
	- hi
	- en
	---

	# LLama3-Gaja-Hindi-8B-v0.1

	## Overview

	LLama3-Gaja-Hindi-8B-v0.1 is an extension of the Ambari series, a bilingual English/Hindi model developed and released by [Cognitivelab.in](https://www.cognitivelab.in/). This model is specialized for natural language understanding tasks, particularly in the context of instructional pairs. It is built upon the [Llama3 8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, utilizing a fine-tuning process with a curated dataset of translated instructional pairs.

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6442d975ad54813badc1ddf7/G0u9L6RQJFinST0chQmfL.jpeg" width="500px">

	## Generate
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

	model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
	tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)

	# Existing messages list
	messages = [
	{"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
	{"role": "user", "content": "Who are you"}
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	# tokenize=False,
	return_tensors="pt"
	).to("cuda")

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	eos_token_id=tokenizer.convert_tokens_to_ids("<\|eot_id\|>"),
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```


	## Multi-turn Chat

	To use the Ambari-7B-Instruct-v0.1 model, you can follow the example code below:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

	model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
	tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)

	# Existing messages list
	messages = [
	{"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
	]

	# Function to add user input and generate response
	def process_user_input(user_input):
	global messages
	# Add user's input to messages list
	messages.append({"role": "user", "content": user_input})

	# Prepare the prompt for generation
	prompt_formatted_message = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=False
	)

	# Configure generation parameters
	generation_config = GenerationConfig(
	repetition_penalty=1.2,
	max_new_tokens=8000,
	temperature=0.2,
	top_p=0.95,
	top_k=40,
	bos_token_id=tokenizer.bos_token_id,
	eos_token_id=tokenizer.convert_tokens_to_ids("<\|eot_id\|>"),
	pad_token_id=tokenizer.pad_token_id,
	do_sample=True,
	use_cache=True,
	return_dict_in_generate=True,
	output_attentions=False,
	output_hidden_states=False,
	output_scores=False,
	)

	streamer = TextStreamer(tokenizer)
	batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
	print("\033[32mResponse: \033[0m") # Print an empty response
	# Generate response
	generated = model.generate(
	inputs=batch["input_ids"].to("cuda"),
	generation_config=generation_config,
	streamer=streamer,

	)

	# Extract and format assistant's response
	# print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
	assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
	# Find the last occurrence of "assistant" and empty string ("")
	assistant_start_index = assistant_response.rfind("<\|start_header_id\|>assistant<\|end_header_id\|>")
	empty_string_index = assistant_response.rfind("<\|eot_id\|>")

	# Extract the text between the last "assistant" and ""
	if assistant_start_index != -1 and empty_string_index != -1:
	final_response = assistant_response[assistant_start_index + len("<\|start_header_id\|>assistant<\|end_header_id\|>") : empty_string_index]
	else:
	# final_response = assistant_response # If indices not found, use the whole response
	assert "Filed to generate multi turn prompt formate"

	# Append the extracted response to the messages list
	messages.append({"role": "assistant", "content": final_response})
	# messages.append({"role": "assistant", "content": assistant_response})

	# Print assistant's response
	# print(f"Assistant: {assistant_response}")

	# Main interaction loop
	while True:
	print("=================================================================================")
	user_input = input("Input: ") # Prompt user for input

	# Check if user_input is empty
	if not user_input.strip(): # .strip() removes any leading or trailing whitespace
	break # Break out of the loop if input is empty
	# Print response placeholder
	process_user_input(user_input) # Process user's input and generate response

	```

	## Prompt formate

	system prompt = `You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.`
	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{{ system_prompt }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_1 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{{ model_answer_1 }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_2 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	```

	## Benchmarks
	coming soon

	## Bilingual Instruct Fine-tuning

	The model underwent a pivotal stage of supervised fine-tuning with low-rank adaptation, focusing on bilingual instruct fine-tuning. This approach involved training the model to respond adeptly in either English or Hindi based on the language specified in the user prompt or instruction.

	## References

	- [Ambari-7B-Instruct Model](https://huggingface.co/Cognitive-Lab/Ambari-7B-Instruct-v0.1)