llama-3-8b-bangla-4bit / README.md

KillerShoaib

Update README.md

bcdb43f verified 5 months ago

preview code

raw

history blame contribute delete

No virus

4.01 kB

	---
	language:
	- bn
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	base_model: unsloth/llama-3-8b-bnb-4bit
	inference: false
	---

	# LLama-3 Bangla 4 bit

	<div align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65ca6f0098a46a56261ac3ac/O1ATwhQt_9j59CSIylrVS.png" width="300"/>

	</div>

	- Developed by: KillerShoaib
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3-8b-bnb-4bit
	- Datset used for fine-tuning : iamshnoo/alpaca-cleaned-bengali


	# 4-bit Quantization
	This is 4-bit quantization of Llama-3 8b model.


	# Llama-3 Bangla Different Formats

	- `LoRA Adapters only` - [KillerShoaib/llama-3-8b-bangla-lora](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-lora)
	- `GGUF q4_k_m` - [KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M)

	# Model Details

	Llama 3 8 billion model was finetuned using unsloth package on a cleaned Bangla alpaca dataset. After that the model was quantized in 4-bit. The model is finetuned for 2 epoch on a single T4 GPU.


	# Pros & Cons of the Model

	## Pros

	- The model can comprehend the Bangla language, including its semantic nuances
	- Given context model can answer the question based on the context

	## Cons
	- Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla
	- Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla


	# Run The Model

	## FastLanguageModel from unsloth for 2x faster inference

	```python

	from unsloth import FastLanguageModel
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "KillerShoaib/llama-3-8b-bangla-4bit",
	max_seq_length = 2048,
	dtype = None,
	load_in_4bit = True,
	)
	FastLanguageModel.for_inference(model)

	# alpaca_prompt for the model
	alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

	### Instruction:
	{}

	### Input:
	{}

	### Response:
	{}"""

	# input with instruction and input
	inputs = tokenizer(
	[
	alpaca_prompt.format(
	"সুস্থ থাকার তিনটি উপায় বলুন", # instruction
	"", # input
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	# generating the output and decoding it
	outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
	tokenizer.batch_decode(outputs)
	```

	## AutoModelForCausalLM from Hugginface

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "KillerShoaib/llama-3-8b-bangla-4bit" # YOUR MODEL YOU USED FOR TRAINING either hf hub name or local folder name.
	tokenizer_name = model_name

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
	# Load model
	model = AutoModelForCausalLM.from_pretrained(model_name)

	alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

	### Instruction:
	{}

	### Input:
	{}

	### Response:
	{}"""

	inputs = tokenizer(
	[
	alpaca_prompt.format(
	"সুস্থ থাকার তিনটি উপায় বলুন", # instruction
	"", # input
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
	tokenizer.batch_decode(outputs)
	```

	# Inference Script & Github Repo

	- `Google Colab` - [Llama-3 8b Bangla Inference Script](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing)
	- `Github Repo` - [Llama-3 Bangla](https://github.com/KillerShoaib/Llama-3-Bangla)