--- language: - bn license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl base_model: unsloth/llama-3-8b-bnb-4bit --- # LLama-3 Bangla LoRA
- **Developed by:** KillerShoaib - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit - **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali # LoRA Adapter **This is not the entire model, but rather only the LoRA adapter.** # Llama-3 Bangla Different Formats - `4-bit quantized(QLoRA)` - [**KillerShoaib/llama-3-8b-bangla-4bit**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-4bit) - `GGUF q4_k_m` - [**KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M) # Model Details Llama 3 8 billion model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. The model is finetuned for **2 epoch** on a single T4 GPU. # Pros & Cons of the Model ## Pros - **The model can comprehend the Bangla language, including its semantic nuances** - **Given context model can answer the question based on the context** ## Cons - **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla** - **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla** # Run The Model ## FastLanguageModel from unsloth for 2x faster inference ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "KillerShoaib/llama-3-8b-bangla-lora", max_seq_length = 2048, dtype = None, load_in_4bit = True, ) FastLanguageModel.for_inference(model) # alpaca_prompt for the model alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" # input with instruction and input inputs = tokenizer( [ alpaca_prompt.format( "সুস্থ থাকার তিনটি উপায় বলুন", # instruction "", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") # generating the output and decoding it outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True) tokenizer.batch_decode(outputs) ``` ## AutoModelForPeftCausalLM from Hugginface ```python from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer model = AutoPeftModelForCausalLM.from_pretrained( "KillerShoaib/llama-3-8b-bangla-lora", load_in_4bit = True, ) tokenizer = AutoTokenizer.from_pretrained("KillerShoaib/llama-3-8b-bangla-lora") alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" inputs = tokenizer( [ alpaca_prompt.format( "সুস্থ থাকার তিনটি উপায় বলুন", # instruction "", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True) tokenizer.batch_decode(outputs) ``` **AutoModelForPeftCausalLM can be hopelessly slow, since `4bit` model downloading is not supported. Use this only if you don't have unsloth installed** # Inference Script & Github Repo - `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing) - `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)