Vakya-Mini-Extended-100M (vakya-v1.2.1)

Vakya is a lightweight en-hi translation model finetuned using SFT on the base falcon-h1-tiny-multilingual-100m-instruct model. It offers quick and fairly accurate hindi translations for english sentences. It's small size (108M parameters) allows it to run comfortably on laptop grade GPUs.

Vakya-Mini-Extended is the first generation of Lightweight Indic Translator(LIT) models which can provide accurate and fast translation in local, memory-constrained environments. The Extended model features updated training on a larger translation corpus which leads to overall superior translation capabilities.

The Vakya series will be made available in the following sizes: Mini(100M), Standard(270M), Large(500M)

Estimated parameters: ~100M

Architecture: Falcon-H1

Intended use: English-Hindi Translations

Training data

Source: en-hi-instruct-structured dataset (https://huggingface.co/datasets/DireDreadlord/en-hi-instruct-structured)
Rows: ~1,660,000 rows templated with a custom .jinja chat format
Training: trained for 2,000 steps on an RTX 3050 (4GB VRAM)

Usage

Install requirements:

pip install -r requirements.txt
pip install transformers datasets accelerate safetensors

Usage (Hugging Face Hub)

You can load it directly from HuggingFace:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "DireDreadlord/Vakya-Mini-Extended-100M"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto")
model.eval()
model.to(device)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model.resize_token_embeddings(len(tokenizer))


sentence = "I work at the market."

messages = [
    {
        "role": "user",
        "content": "Translate the following English sentence into Hindi:\n\n" + sentence,
    }
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
input_ids = {k: v.to(device) for k, v in input_ids.items()}


outputs = model.generate(**input_ids, max_new_tokens=128, do_sample=False)

prompt_text = tokenizer.decode(input_ids["input_ids"][0], skip_special_tokens=True)
full_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

if full_text.startswith(prompt_text):
    output_text = full_text[len(prompt_text):].strip()
else:
    output_text = full_text

print(output_text)

Limitations

The model is exceptionally light(108M params), it may hallucinate under heavy use.
Model for experimental use only; users should employ it as such under license.