File size: 4,027 Bytes

---
language:
- bn
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-8b-bnb-4bit
---


# LLama-3 Bangla LoRA

<div align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ca6f0098a46a56261ac3ac/O1ATwhQt_9j59CSIylrVS.png" width="300"/>

</div>

- **Developed by:** KillerShoaib
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
- **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali


# LoRA Adapter
**This is not the entire model, but rather only the LoRA adapter.**

# Llama-3 Bangla Different Formats

- `4-bit quantized(QLoRA)` - [**KillerShoaib/llama-3-8b-bangla-4bit**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-4bit)
- `GGUF q4_k_m` - [**KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M)

# Model Details

Llama 3 8 billion model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. The model is finetuned for **2 epoch** on a single T4 GPU.


# Pros & Cons of the Model

## Pros

- **The model can comprehend the Bangla language, including its semantic nuances**
- **Given context model can answer the question based on the context**

## Cons
- **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla**
- **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla**


# Run The Model

## FastLanguageModel from unsloth for 2x faster inference

```python

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "KillerShoaib/llama-3-8b-bangla-lora",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# alpaca_prompt for the model
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# input with instruction and input
inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

# generating the output and decoding it
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)
```

## AutoModelForPeftCausalLM from Hugginface

```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "KillerShoaib/llama-3-8b-bangla-lora",
    load_in_4bit = True,
)
tokenizer = AutoTokenizer.from_pretrained("KillerShoaib/llama-3-8b-bangla-lora")

alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
tokenizer.batch_decode(outputs)
```


**AutoModelForPeftCausalLM can be hopelessly slow, since `4bit` model downloading is not supported. Use this only if you don't have unsloth installed**

# Inference Script & Github Repo

- `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing)
- `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)