arbitropy commited on
Commit
5ec90f7
1 Parent(s): 3699e03

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ pipeline_tag: text-generation
5
+ ---
6
+ ### KUETLLM_zyphyr7b_gguf
7
+ KUETLLM is a [zephyr7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) finetune, using a dataset with prompts and answers about Khulna University of Engineering and Technology.
8
+ It was loaded in 8 bit quantization using [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). [LORA](https://huggingface.co/docs/diffusers/main/en/training/lora) was used to finetune an adapter, which was leter merged with the base unquantized model.
9
+ The finetuned unquantized model will be found [here](https://huggingface.co/shahidul034/KUETLLM_zephyr_base). It was later quantized and converted into gguf format using [llama.cpp](https://github.com/ggerganov/llama.cpp).
10
+
11
+ ## Below is the training configuarations for the finetuning process:
12
+ ```
13
+ LoraConfig:
14
+ r=16,
15
+ lora_alpha=16,
16
+ target_modules=["q_proj", "v_proj","k_proj","o_proj","gate_proj","up_proj","down_proj"],
17
+ lora_dropout=0.05,
18
+ bias="none",
19
+ task_type="CAUSAL_LM"
20
+ ```
21
+ ```
22
+ TrainingArguments:
23
+ per_device_train_batch_size=12,
24
+ gradient_accumulation_steps=1,
25
+ optim='paged_adamw_8bit',
26
+ learning_rate=5e-06 ,
27
+ fp16=True,
28
+ logging_steps=10,
29
+ num_train_epochs = 1,
30
+ output_dir=zephyr_lora_output,
31
+ remove_unused_columns=False,
32
+ ```
33
+ ```
34
+ Llama.cpp quantization parameter = q4_k_m
35
+ ```
36
+
37
+ ## Inferencing using llama.cpp command:
38
+ Download the gguf file manually or huggingface_hub.
39
+ Setup llama.cpp
40
+ Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
41
+
42
+ ```shell
43
+ ./main -ngl 35 -m zephyr_q4km_kuetllm.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|system|>\nYou are a KUET authority managed chatbot, help users by answering their queries about KUET.\n<|user|>\nTell me about KUET.\n<|assistant|>\n"
44
+ ```
45
+
46
+ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
47
+
48
+ Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value.
49
+
50
+ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
51
+
52
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)