File size: 4,027 Bytes
71e7568
 
6d619d0
71e7568
 
 
 
 
 
 
 
 
 
6d619d0
f35445d
6d619d0
 
 
 
 
71e7568
 
 
 
4bfb426
6d619d0
 
2f1fba0
 
6d619d0
ed29284
 
 
 
 
6d619d0
 
2f1fba0
6d619d0
 
c88aedd
 
 
 
 
9a5dd28
c88aedd
 
 
 
 
6d619d0
 
 
 
 
 
 
 
 
0446599
6d619d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0446599
6d619d0
 
91fbe5c
6d619d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71e7568
8c1a0fd
 
 
 
6f409e6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
language:
- bn
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-8b-bnb-4bit
---


# LLama-3 Bangla LoRA

<div align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ca6f0098a46a56261ac3ac/O1ATwhQt_9j59CSIylrVS.png" width="300"/>

</div>

- **Developed by:** KillerShoaib
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
- **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali


# LoRA Adapter
**This is not the entire model, but rather only the LoRA adapter.**

# Llama-3 Bangla Different Formats

- `4-bit quantized(QLoRA)` - [**KillerShoaib/llama-3-8b-bangla-4bit**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-4bit)
- `GGUF q4_k_m` - [**KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M)

# Model Details

Llama 3 8 billion model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. The model is finetuned for **2 epoch** on a single T4 GPU.


# Pros & Cons of the Model

## Pros

- **The model can comprehend the Bangla language, including its semantic nuances**
- **Given context model can answer the question based on the context**

## Cons
- **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla**
- **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla**


# Run The Model

## FastLanguageModel from unsloth for 2x faster inference

```python

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "KillerShoaib/llama-3-8b-bangla-lora",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# alpaca_prompt for the model
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# input with instruction and input
inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

# generating the output and decoding it
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)
```

## AutoModelForPeftCausalLM from Hugginface

```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "KillerShoaib/llama-3-8b-bangla-lora",
    load_in_4bit = True,
)
tokenizer = AutoTokenizer.from_pretrained("KillerShoaib/llama-3-8b-bangla-lora")

alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    alpaca_prompt.format(
        "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
tokenizer.batch_decode(outputs)
```


**AutoModelForPeftCausalLM can be hopelessly slow, since `4bit` model downloading is not supported. Use this only if you don't have unsloth installed**

# Inference Script & Github Repo

- `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing)
- `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)