PEFT
English
adriantheuma commited on
Commit
63231c1
1 Parent(s): 5babecf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -13
README.md CHANGED
@@ -1,21 +1,36 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
3
  ---
4
- ## Training procedure
5
 
 
 
 
 
 
 
 
 
 
6
 
7
- The following `bitsandbytes` quantization config was used during training:
8
- - load_in_8bit: True
9
- - load_in_4bit: False
10
- - llm_int8_threshold: 6.0
11
- - llm_int8_skip_modules: None
12
- - llm_int8_enable_fp32_cpu_offload: False
13
- - llm_int8_has_fp16_weight: False
14
- - bnb_4bit_quant_type: fp4
15
- - bnb_4bit_use_double_quant: False
16
- - bnb_4bit_compute_dtype: float32
17
 
18
- ### Framework versions
19
 
20
- - PEFT 0.4.0
21
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
+ license: apache-2.0
4
+ datasets:
5
+ - adriantheuma/raven-data
6
+ language:
7
+ - en
8
  ---
9
+ ### Training details
10
 
11
+ * Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer).
12
+ * Maximum context length: 1,204 tokens
13
+ * Per device train batch: 1
14
+ * Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
15
+ * Quantisation: 8-bit
16
+ * Optimiser: adamw
17
+ * Learning_rate: 3 × 10−4
18
+ * warmup_steps: 100
19
+ * epochs: 5
20
 
21
+ * Low Rank Adaptation (LoRA)
22
+ * rank: 16
23
+ * alpha: 16
24
+ * dropout: 0.05
25
+ * target modules: q_proj, k_proj, v_proj, and o_proj
 
 
 
 
 
26
 
27
+ This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model.
28
 
29
+ ### Training hardware
30
 
31
+ This model is trained on commodity hardware equipped with a:
32
+ * 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
33
+ * 64 GB installed RAM
34
+ * NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
35
+
36
+ The trained model consumed 100 GPU hours during training.