File size: 3,917 Bytes
d856806 2c43b4e 67a684c d856806 2c43b4e d856806 2c43b4e d856806 0825485 39b30cb 2c43b4e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
language:
- en
- yo
- zu
- xh
- wo
- fr
- ig
- ha
- am
- ar
- so
- sw
- sn
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
datasets:
- vutuka/aya_african_alpaca
pipeline_tag: text-generation
---
# Llama-3.1-8B-african-aya
- **Developed by:** vutuka
- **License:** apache-2.0
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-bnb-4bit
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
## Unsloth Inference (2x Faaaaster)
```sh
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
```
```py
max_seq_length = 4096
dtype = None
load_in_4bit = True # Use 4bit quantization to reduce memory usage.
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
```
```py
## Load the Quantize model
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "vutuka/Llama-3.1-8B-african-aya",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
```
```py
def llama_african_aya(input: str = "", instruction: str = ""):
inputs = tokenizer(
[
alpaca_prompt.format(
instruction,
input,
"",
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
# _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 800)
# Generate the response
output = model.generate(**inputs, max_new_tokens=1024)
# Decode the generated response
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Extract the response part if needed (assuming the response starts after "### Response:")
response_start = generated_text.find("### Response:") + len("### Response:")
response = generated_text[response_start:].strip()
# Format the response in Markdown
# markdown_response = f"{response}"
# Render the markdown response
# display(Markdown(markdown_response))
return response
```
```py
llama_african_aya(
instruction="",
input="Àwọn ajínigbé méjì ni wọ́n mú ní Supare Akoko, ṣàlàyé ìtàn náà."
)
```
## LlamaCPP Code
```sh
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install llama-cpp-python
````
```py
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
## Download the GGUF model
model_name = "vutuka/Llama-3.1-8B-african-aya"
model_file = "llama-3.1-8B-african-aya.Q8_0.gguf"
model_path = hf_hub_download(model_name, filename=model_file)
## Instantiate model from downloaded file
llm = Llama(
model_path=model_path,
n_ctx=4096,
n_gpu_layers=-1,
n_batch=512,
verbose=False,
)
## Run inference
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
prompt = alpaca_prompt.format(
"",
"Àwọn ajínigbé méjì ni wọ́n mú ní Supare Akoko, ṣàlàyé ìtàn náà.",
"",
)
res = llm(prompt) # Res is a dictionary
## Unpack and the generated text from the LLM response dictionary and print it
print(res["choices"][0]["text"])
# res is short for result
```
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |