Edit model card

Eli: A Bilingual Hindi-English Large Language Model

Introduction

Eli is an innovative, open-source bilingual Hindi-English Large Language Model (LLM) designed to bridge the linguistic gap between Hindi and English. Developed with meticulous attention to detail, Eli represents a pioneering effort to broaden the scope of LLMs to diverse languages.

Purpose Behind Eli

Why We Built Eli:

  • Language Adaptation: Enhance language adaptability within LLMs for Hindi and English.
  • Efficient Training: Train and finetune on a compact dataset of 1 billion tokens.
  • Optimized Processes: Identify and implement the most efficient training processes.
  • World Knowledge Acquisition: Observe how the model acquires and processes world knowledge.
  • Training Method Optimization: Optimize training methods tailored to each development stage.

Development Stages

Pre-training

  • Objective: Familiarize Eli with a newly enriched vocabulary.
  • Method: Full-weight pre-training on a 500-million-token corpus using 2xA100 GPUs, taking about 25 hours.
  • Outcome: Improved Hindi token prediction and generation capabilities.

Bilingual Next Token Prediction and Translation

  • Inspired By: The open Hathi series by Sarvam.ai.
  • Dataset: 200,000 tokens, with translation using IndicTrans2.
  • Method: Alternating sentences between Hindi and English for enhanced alignment and balanced exposure.

Bilingual Instruct Fine-tuning

  • Objective: Enhance model responsiveness in both English and Hindi.
  • Method: Supervised fine-tuning with low-rank adaptation using various instruction datasets.
  • Outcome: A finely-tuned model available on Hugging Face, with a 4-bit quantized version for hands-on experience.

DPO Fine-tuning

  • Objective: Refine model preferences using Direct Preference Optimization.
  • Method: Translation and fine-tuning with the Anthropic/hh-rlhf dataset.
  • Outcome: Ongoing comprehensive evaluation.

Learnings and Future Directions

Challenges:

  • World Knowledge: Occasional hallucinations in response to specific queries.
  • Translation: Requires more training data for nuanced translations.
  • Fine-tuning: Future iterations will balance between full-weight and Lora fine-tuning based on further tests.

What's Next:

  • Romanized Hindi: Incorporate Romanized Hindi for added linguistic versatility.
  • Continuous Learning: Refine data pipelines, increase the training dataset to 10-15 billion Hindi tokens, and improve efficiency.

Generate

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained("Neohumans-ai/Eli", torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Neohumans-ai/Eli", trust_remote_code=True)

# Existing messages list
messages = [
    {"role": "system", "content": " You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
    {"role": "user", "content": "Who are you"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    # tokenize=False, 
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Multi-turn Chat

To use the Eli model, you can follow the example code below:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained("Neohumans-ai/Eli", torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Neohumans-ai/Eli", trust_remote_code=True)

# Existing messages list
messages = [
    {"role": "system", "content": " You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
]

# Function to add user input and generate response
def process_user_input(user_input):
    global messages
    # Add user's input to messages list
    messages.append({"role": "user", "content": user_input})

    # Prepare the prompt for generation
    prompt_formatted_message = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=False
    )

    # Configure generation parameters
    generation_config = GenerationConfig(
        repetition_penalty=1.2,
        max_new_tokens=8000,
        temperature=0.2,
        top_p=0.95,
        top_k=40,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
        pad_token_id=tokenizer.pad_token_id,
        do_sample=True,
        use_cache=True,
        return_dict_in_generate=True,
        output_attentions=False,
        output_hidden_states=False,
        output_scores=False,
    )

    streamer = TextStreamer(tokenizer)
    batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
    print("\033[32mResponse: \033[0m")  # Print an empty response
    # Generate response
    generated = model.generate(
        inputs=batch["input_ids"].to("cuda"),
        generation_config=generation_config,
        streamer=streamer,

    )

    # Extract and format assistant's response
    # print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
    assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
     # Find the last occurrence of "assistant" and empty string ("")
    assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>")
    empty_string_index = assistant_response.rfind("<|eot_id|>")

    # Extract the text between the last "assistant" and ""
    if assistant_start_index != -1 and empty_string_index != -1:
        final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index]
    else:
        # final_response = assistant_response  # If indices not found, use the whole response
        assert "Filed to generate multi turn prompt formate"

    # Append the extracted response to the messages list
    messages.append({"role": "assistant", "content": final_response})
    # messages.append({"role": "assistant", "content": assistant_response})

    # Print assistant's response
    # print(f"Assistant: {assistant_response}")

# Main interaction loop
while True:
    print("=================================================================================")
    user_input = input("Input: ")  # Prompt user for input
    
    # Check if user_input is empty
    if not user_input.strip():  # .strip() removes any leading or trailing whitespace
        break  # Break out of the loop if input is empty
      # Print response placeholder
    process_user_input(user_input)  # Process user's input and generate response

Prompt formate

system prompt = You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Benchmarks

coming soon

Conclusion

Eli is designed to handle multi-turn chat conversations and understands Hinglish, making it highly effective for bilingual and code-mixed language contexts. Explore Eli’s capabilities on Hugging Face and experience the model firsthand on chat.cognitivelab.in.

Weights and datasets are available on Hugging Face:

Stay tuned for more updates as we continue to evolve and enrich Eli.

Downloads last month
9
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Neohumans-ai/Eli-Hindi-v0.1

Quantizations
1 model