BanglishSentiment-Llama3-8B

Fine-tuned Llama 3 for Sentiment Analysis of Banglish (Bengali-English Code-Switched Texts)

Introduction

This model is a fine-tuned version of Meta Llama 3 (8B) specifically adapted for sentiment analysis on Banglish, a code-switched mix of Bengali and English frequently used in informal online communication in Bangladesh. Fine-tuning was performed on 11,673 manually curated Banglish sentences, balancing sentiment classes (positive, negative, and neutral). The model outperforms zero-shot capabilities of other state-of-the-art LLMs, making it a strong candidate for handling multilingual, low-resource, and code-mixed language tasks.

Model Description

Base Model: Meta Llama 3 (8B-Instruct)
Fine-tuned on: 11,673 Banglish-labeled sentences
Dataset Source: Bengali_Banglish_80K_Dataset
Fine-Tuning Method: Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA)
Quantization: 4-bit (bnb-4bit)
Training Framework: PyTorch
Device: High-performance GPU, Mixed Precision (FP16/BF16)
Max Sequence Length: 2048 tokens
Optimizer: AdamW with weight decay (0.01)
Learning Rate: 5e-5
Batch Size: 4 (gradient accumulation over 4 steps)
Training Steps: 1,000 (with early stopping)
Sampling Parameters:
Max New Tokens: 256
Top P Sampling: 0.9
Temperature: 0.0 (deterministic outputs)

Performance Metrics

Method	Accuracy	Precision	Recall	F1-Score
Zero-Shot (Llama 3)	43.80%	46.68%	30.00%	41.67%
Zero-Shot (GPT-3.5)	55.90%	61.00%	56.00%	48.00%
Zero-Shot (GPT-4)	65.15%	68.00%	65.00%	61.00%
Zero-Shot (Claude 3.5)	47.68%	73.00%	48.00%	43.00%
Few-Shot (Llama 3)	49.14%	52.01%	49.00%	48.00%
Dual-Phase (Translation + Sentiment)	48.07%	49.34%	48.00%	48.00%
Fine-Tuned (Ours)	66.87%	67.00%	67.00%	67.00%
Ensemble Approach	66.78%	66.75%	66.78%	66.45%

The fine-tuned Llama 3 model achieved the highest accuracy (66.87%) and balanced performance across all key metrics, surpassing GPT-4 (65.15%) and other models in code-switched sentiment analysis.

How to Use

This model can be used for Banglish sentiment classification in three categories:

Positive (1)
Negative (-1)
Neutral (0)

Example Code for Inference:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("samiur-r/BanglishSentiment-Llama3-8B")
model = AutoModelForCausalLM.from_pretrained("samiur-r/BanglishSentiment-Llama3-8B")

# Example text
text = "Ami onek happy, this is the best day ever!"

# Tokenize input
inputs = tokenizer(text, return_tensors="pt")

# Generate output
output = model.generate(**inputs, max_new_tokens=256)

# Decode output
sentiment = tokenizer.decode(output[0], skip_special_tokens=True)
print("Predicted Sentiment:", sentiment)

Limitations & Future Improvements

The model may struggle with sarcasm, idiomatic expressions, and highly ambiguous Banglish sentences.
Further improvements could be achieved with more training data and context-aware learning strategies.
Expanding datasets with other code-mixed variations (e.g., Hindi-English, Urdu-English) could enhance adaptability.

Acknowledgment

This work was made possible by Unsloth's Llama 3 (8B-BNB-4bit), Meta's Llama 3, and Hugging Face for hosting the model.
Special thanks to the Bengali_Banglish_80K_Dataset contributors for providing a valuable resource for Banglish NLP research.
We also acknowledge the Unsloth fine-tuning framework for enabling efficient adaptation of large language models with LoRA and 4-bit quantization.

samiur-r
/

BanglishSentiment-Llama3-8B