lumaticai/BongLlama-1.1B-Chat-alpha-v0

Introducing BongLlama by LumaticAI. A finetuned version of TinyLlama 1.1B Chat on Bengali Dataset.

BongLlama

Model Details

Model Description

Bongllama is a sub-part of our company's initiative for developing Indic and Regional Large Language Models. We are LumaticAI continuously working on helping our clients build Custom AI Solutions for their organization. We have taken an initiative to launch open source models specific to regions and languages.

Bongllama is a LLM built for West Bengal on Bengali dataset. It's a 1.1B parameters model. We have used a Bengali dataset of 10k data i.e lumatic-ai/BongChat-10k-v0 and finetuned on TinyLlama/TinyLlama-1.1B-Chat-v1.0 model to get our BongLlama 1.1B Chat Alpha v0 model.

We are continuously working on training and developing this model and improve it. We are also going to launch this model with various sizes of different LLM's and Datasets.

  • Developed by: LumaticAI
  • Shared by [Optional]: LumaticAI
  • Model type: Language model
  • Language(s) (NLP): en, bn
  • License: mit
  • Parent Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Uses

Direct Use

  • base model for further finetuning
  • get an overview of how indic LLM work on specific language
  • for fun

Downstream Use

  • can be deployed with api
  • used to create webapp or app to show demo

Out-of-Scope Use

  • cannot be used for production purpose
  • cannot be used to generate text for research or academic purposes

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

Pipeline

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

def formatted_prompt(question)-> str:
    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"

hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"

tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
pipe = pipeline(
    "text-generation",
    model=hub_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
)

from time import perf_counter
start_time = perf_counter()

prompt = formatted_prompt('হ্যালো')
sequences = pipe(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=256
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

output_time = perf_counter() - start_time
print(f"Time taken for inference: {round(output_time,2)} seconds")

Streaming Response (ChatGPT, Bard like)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

def formatted_prompt(question)-> str:
    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"

hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"

tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
model = AutoModelForCausalLM.from_pretrained(hub_model_name)

prompt = formatted_prompt('prompt here')
inputs = tokenizer([prompt], return_tensors="pt")
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, eos_token_id=[tokenizer.eos_token_id],streamer=streamer, max_new_tokens=256)

Using Generation Config

import torch
from transformers import GenerationConfig
from time import perf_counter

def formatted_prompt(question)-> str:
    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"

hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"

tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
model = AutoModelForCausalLM.from_pretrained(hub_model_name)

prompt = formatted_prompt('হ্যালো')

# Check for GPU availability
if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

# Move model and inputs to the GPU (if available)
model.to(device)
inputs = tokenizer(prompt, return_tensors="pt").to(device)

generation_config = GenerationConfig(
    penalty_alpha=0.6,
    do_sample=True,
    top_k=5,
    temperature=0.5,
    repetition_penalty=1.2,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

start_time = perf_counter()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
output_time = perf_counter() - start_time
print(f"Time taken for inference: {round(output_time, 2)} seconds")

Training Details

Training Data

we used our dataset of 10k data which consists of Questions and Responses. The dataset name is lumatic-ai/BongChat-v0-10k.

Example Data

  • <|im_start|>user এটি কি ইতিবাচক পর্যালোচনা? এই ফ্র্যাঞ্চাইজি কখনও হার্ট স্টপ স্টান্টস, ফাইট সিকোয়েন্স এবং তারকা শক্তি বন্ধ করে হতাশ করে না।এবার মূল চারটি ছাড়াও নতুন মুখগুলি হলেন আবুয়েলিতা চরিত্রে রিতা মোরেনো এবং দান্তে জেসন মোমোয়া যিনি শোটি চুরি করেছেন।<|im_end|> <|im_start|>assistant এটি একটি খুব ইতিবাচক পর্যালোচনা, এটি ইঙ্গিত করে যে ফ্র্যাঞ্চাইজি শ্রোতাদের জন্য আরও পছন্দসই চরিত্রগুলি যুক্ত করার পাশাপাশি অতীতের প্রত্যাশার ভিত্তিতে বিতরণ করেছে।<|im_end|>

Training Procedure

Preprocessing

  • Dataset Format <|im_start|>user <question><|im_end|> <|im_start|>assistant <response><|im_end|>

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0

Evaluation

Metrics

  • train/loss
  • steps

Results

_runtime _timestamp train/epoch train/total_flos train/train_loss train/global_step train/train_steps_per_second train/loss train/train_samples_per_second train/train_runtime _step train/learning_rate
0 205.76071906089783 1705483341.4811552 0.08 100 1.2865 0 0.0001869158878504673
1 406.9242510795593 1705483542.6446872 0.17 200 1.0698 1 0.00019964245392895794
2 607.5763952732086 1705483743.2968314 0.25 300 1.0457 2 0.00019846317589644678
3 808.9941129684448 1705483944.714549 0.34 400 1.0131 3 0.00019646988832610704
4 1012.7936038970947 1705484148.51404 0.42 500 1.0 4 0.00019367907001906532
5 1217.8231673240662 1705484353.5436034 0.51 600 0.9913 5 0.0001901137930801933
6 1422.651272058487 1705484558.3717082 0.59 700 0.9904 6 0.00018580353217762766
7 1624.9901471138 1705484760.7105832 0.67 800 0.9705 7 0.0001807839208713596
8 1827.1909170150757 1705484962.911353 0.76 900 0.9661 8 0.00017509645702535999
9 2033.6470217704773 1705485169.3674579 0.84 1000 0.9588 9 0.00016878815973864268
10 2241.5517098903656 1705485377.272146 0.93 1100 0.9469 10 0.00016191118063146672
11 2446.751221895218 1705485582.471658 1.01 1200 0.9453 11 0.0001545223727002313
12 2648.367230653763 1705485784.0876667 1.09 1300 0.9329 12 0.0001466828203054036
13 2849.9791855812073 1705485985.6996217 1.18 1400 0.9299 13 0.0001384573341781387
14 3050.282051086426 1705486186.0024872 1.26 1500 0.9181 14 0.00012991391562044527
15 3252.6823406219482 1705486388.4027767 1.35 1600 0.917 15 0.00012112319432843371
16 3456.3907039165497 1705486592.11114 1.43 1700 0.919 16 0.00011215784448624378
17 3658.387463569641 1705486794.1078997 1.52 1800 0.9156 17 0.00010309198395788984
18 3860.850716114044 1705486996.5711522 1.6 1900 0.9074 18 9.400056154399221e-05
19 4063.906144142151 1705487199.6265802 1.68 2000 0.9072 19 8.49587373690336e-05
20 4266.29203081131 1705487402.012467 1.77 2100 0.9061 20 7.604126152157019e-05
21 4468.759161949158 1705487604.479598 1.85 2200 0.9104 21 6.732185608427e-05
22 4671.109050750732 1705487806.8294868 1.94 2300 0.9016 22 5.8872605662626776e-05
23 4875.181975841522 1705488010.902412 2.02 2400 0.8957 23 5.076336145093832e-05
24 5077.5954213142395 1705488213.3158574 2.11 2500 0.8948 24 4.3061163762223156e-05
25 5280.958572149277 1705488416.6790082 2.19 2600 0.8833 25 3.582968779610564e-05
26 5483.901570320129 1705488619.6220064 2.27 2700 0.9019 26 2.912871722658781e-05
27 5684.498034954071 1705488820.218471 2.36 2800 0.8921 27 2.30136499616351e-05
28 5885.339627027512 1705489021.0600631 2.44 2900 0.8897 28 1.753504016053409e-05
29 6089.49475812912 1705489225.2151942 2.53 3000 0.8765 29 1.2738180295232205e-05
30 6291.281028032303 1705489427.0014641 2.61 3100 0.889 30 8.662726710819169e-06
31 6494.627055644989 1705489630.3474917 2.69 3200 0.8846 31 5.342371780697386e-06
32 6695.168158054352 1705489830.8885942 2.78 3300 0.8908 32 2.804565366782108e-06
33 6898.186992406845 1705490033.9074285 2.86 3400 0.885 33 1.0702878874610523e-06
34 7099.970013856888 1705490235.69045 2.95 3500 0.8871 34 1.5387686939386526e-07
35 7221.330135822296 1705490357.050572 3.0 8.3571998449877e+16 0.9397975607756582 3561 0.491 3.926 7259.0631 35

Model Examination

We will be further finetuning this model on large dataset to see how it performs

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 1 X Tesla T4
  • Hours used: 2.21
  • Cloud Provider: Google Colab
  • Compute Region: India
  • Carbon Emitted: 0.14

Technical Specifications

Model Architecture and Objective

Finetuned on Tiny-Llama 1.1B Chat model

Hardware

1 X Tesla T4

Citation

BibTeX:

@misc{BongLlama-1.1B-Chat-alpha-v0,
      url={[https://huggingface.co/lumatic-ai/BongLlama-1.1B-Chat-alpha-v0](https://huggingface.co/lumatic-ai/BongLlama-1.1B-Chat-alpha-v0)},
      title={BongLlama 1.1B Chat Aplha V0},
      author={LumaticAI, Rohan Shaw, Vivek Kushal, Jeet Ghosh},
      year={2024}, month={Jan}
}

Model Card Authors

lumatic-ai

Model Card Contact

email : contact@lumaticai.com

Downloads last month
38
Safetensors
Model size
1.1B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train lumatic-ai/BongLlama-1.1B-Chat-alpha-v0