Muhammadreza commited on
Commit
e6d6d72
1 Parent(s): 0a7bb41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md CHANGED
@@ -1,6 +1,83 @@
1
  ---
2
  library_name: peft
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ## Training procedure
5
 
6
 
 
1
  ---
2
  library_name: peft
3
+ license: mit
4
  ---
5
+
6
+ # Chinkara 7B (Improved)
7
+
8
+ _Chinkara_ is a Large Language Model trained on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) dataset based on Meta's brand new LLaMa-2 with 7 billion parameters using QLoRa Technique, optimized for small consumer size GPUs.
9
+ ![logo](chinkara-logo.png)
10
+
11
+ ## Information
12
+
13
+ For more information about the model please visit [prp-e/chinkara](https://github.com/prp-e/chinkara) on Github.
14
+
15
+ ## Inference Guide
16
+
17
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]()
18
+
19
+ _NOTE: This part is for the time you want to load and infere the model on your local machine. You still need 8GB of VRAM on your GPU. The recommended GPU is at least a 2080!_
20
+
21
+ ### Installing libraries
22
+
23
+ ```
24
+ pip install -U bitsandbytes
25
+ pip install -U git+https://github.com/huggingface/transformers.git
26
+ pip install -U git+https://github.com/huggingface/peft.git
27
+ pip install -U git+https://github.com/huggingface/accelerate.git
28
+ pip install -U datasets
29
+ pip install -U einops
30
+ ```
31
+
32
+ ### Loading the model
33
+
34
+ ```python
35
+ import torch
36
+ from peft import PeftModel
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
38
+
39
+ model_name = "Trelis/Llama-2-7b-chat-hf-sharded-bf16"
40
+ adapters_name = 'MaralGPT/chinkara-7b-improved'
41
+
42
+ model = AutoModelForCausalLM.from_pretrained(
43
+ model_name,
44
+ load_in_4bit=True,
45
+ torch_dtype=torch.bfloat16,
46
+ device_map="auto",
47
+ max_memory= {i: '24000MB' for i in range(torch.cuda.device_count())},
48
+ quantization_config=BitsAndBytesConfig(
49
+ load_in_4bit=True,
50
+ bnb_4bit_compute_dtype=torch.bfloat16,
51
+ bnb_4bit_use_double_quant=True,
52
+ bnb_4bit_quant_type='nf4'
53
+ ),
54
+ )
55
+ model = PeftModel.from_pretrained(model, adapters_name)
56
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
57
+ ```
58
+
59
+ ### Setting the model up
60
+
61
+ ```python
62
+ from peft import LoraConfig, get_peft_model
63
+
64
+ model = PeftModel.from_pretrained(model, adapters_name)
65
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
66
+ ```
67
+
68
+ ### Prompt and inference
69
+
70
+ ```python
71
+ prompt = "What is the answer to life, universe and everything?"
72
+
73
+ prompt = f"###Human: {prompt} ###Assistant:"
74
+
75
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
76
+ outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=50, temperature=0.5, repetition_penalty=1.0)
77
+ answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
78
+ print(answer)
79
+ ```
80
+
81
  ## Training procedure
82
 
83