---
language:
- bn
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-8b-bnb-4bit
inference: false
---

# LLama-3 Bangla 4 bit

<div align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ca6f0098a46a56261ac3ac/O1ATwhQt_9j59CSIylrVS.png" width="300"/>

</div>

- **Developed by:** KillerShoaib
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
- **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali


# 4-bit Quantization
**This is 4-bit quantization of Llama-3 8b model.**


# Llama-3 Bangla Different Formats

- `LoRA Adapters only` - [**KillerShoaib/llama-3-8b-bangla-lora**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-lora)
- `GGUF q4_k_m` - [**KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M)

# Model Details

**Llama 3 8 billion** model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. After that the model was quantized in **4-bit**. The model is finetuned for **2 epoch** on a single T4 GPU.


# Pros & Cons of the Model

## Pros

- **The model can comprehend the Bangla language, including its semantic nuances**
- **Given context model can answer the question based on the context**

## Cons
- **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla**
- **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla**


# Run The Model

## FastLanguageModel from unsloth for 2x faster inference

```python

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "KillerShoaib/llama-3-8b-bangla-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# alpaca_prompt for the model
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# input with instruction and input
inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

# generating the output and decoding it
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)
```

## AutoModelForCausalLM from Hugginface

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "KillerShoaib/llama-3-8b-bangla-4bit"  # YOUR MODEL YOU USED FOR TRAINING either hf hub name or local folder name.
tokenizer_name = model_name

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
# Load model
model = AutoModelForCausalLM.from_pretrained(model_name)

alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
tokenizer.batch_decode(outputs)
```

# Inference Script & Github Repo

- `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing)
- `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)