--- language: - bn license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl base_model: unsloth/llama-3-8b-bnb-4bit inference: false --- # LLama-3 Bangla 4 bit
- **Developed by:** KillerShoaib - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit - **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali # 4-bit Quantization **This is 4-bit quantization of Llama-3 8b model.** # Model Details **Llama 3 8 billion** model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. After that the model was quantized in **4-bit**. The model is finetuned for **2 epoch** on a single T4 GPU. # Pros & Cons of the Model ## Pros - **The model can comprehend the Bangla language, including its semantic nuances** - **Given context model can answer the question based on the context** ## Cons - **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla** - **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla** # Run The Model ## FastLanguageModel from unsloth for 2x faster inference ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "KillerShoaib/llama-3-8b-bangla-4bit", max_seq_length = 2048, dtype = None, load_in_4bit = True, ) FastLanguageModel.for_inference(model) # alpaca_prompt for the model alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" # input with instruction and input inputs = tokenizer( [ alpaca_prompt.format( "সুস্থ থাকার তিনটি উপায় বলুন", # instruction "", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") # generating the output and decoding it outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True) tokenizer.batch_decode(outputs) ``` ## AutoModelForCausalLM from Hugginface ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "KillerShoaib/llama-3-8b-bangla-4bit" # YOUR MODEL YOU USED FOR TRAINING either hf hub name or local folder name. tokenizer_name = model_name # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) # Load model model = AutoModelForCausalLM.from_pretrained(model_name) alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" inputs = tokenizer( [ alpaca_prompt.format( "সুস্থ থাকার তিনটি উপায় বলুন", # instruction "", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True) tokenizer.batch_decode(outputs) ``` # Inference Script & Github Repo - `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing) - `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)