Muhammadreza
commited on
Commit
•
e6d6d72
1
Parent(s):
0a7bb41
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,83 @@
|
|
1 |
---
|
2 |
library_name: peft
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
## Training procedure
|
5 |
|
6 |
|
|
|
1 |
---
|
2 |
library_name: peft
|
3 |
+
license: mit
|
4 |
---
|
5 |
+
|
6 |
+
# Chinkara 7B (Improved)
|
7 |
+
|
8 |
+
_Chinkara_ is a Large Language Model trained on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) dataset based on Meta's brand new LLaMa-2 with 7 billion parameters using QLoRa Technique, optimized for small consumer size GPUs.
|
9 |
+
![logo](chinkara-logo.png)
|
10 |
+
|
11 |
+
## Information
|
12 |
+
|
13 |
+
For more information about the model please visit [prp-e/chinkara](https://github.com/prp-e/chinkara) on Github.
|
14 |
+
|
15 |
+
## Inference Guide
|
16 |
+
|
17 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]()
|
18 |
+
|
19 |
+
_NOTE: This part is for the time you want to load and infere the model on your local machine. You still need 8GB of VRAM on your GPU. The recommended GPU is at least a 2080!_
|
20 |
+
|
21 |
+
### Installing libraries
|
22 |
+
|
23 |
+
```
|
24 |
+
pip install -U bitsandbytes
|
25 |
+
pip install -U git+https://github.com/huggingface/transformers.git
|
26 |
+
pip install -U git+https://github.com/huggingface/peft.git
|
27 |
+
pip install -U git+https://github.com/huggingface/accelerate.git
|
28 |
+
pip install -U datasets
|
29 |
+
pip install -U einops
|
30 |
+
```
|
31 |
+
|
32 |
+
### Loading the model
|
33 |
+
|
34 |
+
```python
|
35 |
+
import torch
|
36 |
+
from peft import PeftModel
|
37 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
38 |
+
|
39 |
+
model_name = "Trelis/Llama-2-7b-chat-hf-sharded-bf16"
|
40 |
+
adapters_name = 'MaralGPT/chinkara-7b-improved'
|
41 |
+
|
42 |
+
model = AutoModelForCausalLM.from_pretrained(
|
43 |
+
model_name,
|
44 |
+
load_in_4bit=True,
|
45 |
+
torch_dtype=torch.bfloat16,
|
46 |
+
device_map="auto",
|
47 |
+
max_memory= {i: '24000MB' for i in range(torch.cuda.device_count())},
|
48 |
+
quantization_config=BitsAndBytesConfig(
|
49 |
+
load_in_4bit=True,
|
50 |
+
bnb_4bit_compute_dtype=torch.bfloat16,
|
51 |
+
bnb_4bit_use_double_quant=True,
|
52 |
+
bnb_4bit_quant_type='nf4'
|
53 |
+
),
|
54 |
+
)
|
55 |
+
model = PeftModel.from_pretrained(model, adapters_name)
|
56 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
57 |
+
```
|
58 |
+
|
59 |
+
### Setting the model up
|
60 |
+
|
61 |
+
```python
|
62 |
+
from peft import LoraConfig, get_peft_model
|
63 |
+
|
64 |
+
model = PeftModel.from_pretrained(model, adapters_name)
|
65 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
66 |
+
```
|
67 |
+
|
68 |
+
### Prompt and inference
|
69 |
+
|
70 |
+
```python
|
71 |
+
prompt = "What is the answer to life, universe and everything?"
|
72 |
+
|
73 |
+
prompt = f"###Human: {prompt} ###Assistant:"
|
74 |
+
|
75 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
|
76 |
+
outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=50, temperature=0.5, repetition_penalty=1.0)
|
77 |
+
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
78 |
+
print(answer)
|
79 |
+
```
|
80 |
+
|
81 |
## Training procedure
|
82 |
|
83 |
|