PEFT
English
adriantheuma commited on
Commit
05c26a8
1 Parent(s): 68caf14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -18
README.md CHANGED
@@ -1,31 +1,36 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
3
  ---
4
  ### Training details
5
 
6
- - Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer).
7
- - The maximum context length is limited to 1,204.
8
- - Per device train batch: 1
9
- - Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
10
- - Quantisation: 8-bit (
11
- - Optimiser: adamw
12
- - Learning_rate: 3 × 10−4
13
- - warmup_steps: 100
14
- - epochs: 5
15
 
16
- - Low Rank Adaptation (LoRA)
17
- - rank: 16
18
- - alpha: 16
19
- - dropout: 0.05
20
- - target modules: q_proj, k_proj, v_proj, and o_proj
21
 
22
  This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model.
23
 
24
  ### Training hardware
25
 
26
  This model is trained on commodity hardware equipped with a:
27
- - 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
28
- - 64 GB installed RAM
29
- - NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
30
 
31
- The trained model consumed 100 GPU hours during training.
 
1
  ---
2
  library_name: peft
3
+ license: apache-2.0
4
+ datasets:
5
+ - adriantheuma/raven-data
6
+ language:
7
+ - en
8
  ---
9
  ### Training details
10
 
11
+ * Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer).
12
+ * Maximum context length: 1,204 tokens
13
+ * Per device train batch: 1
14
+ * Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
15
+ * Quantisation: 8-bit
16
+ * Optimiser: adamw
17
+ * Learning_rate: 3 × 10−4
18
+ * warmup_steps: 100
19
+ * epochs: 5
20
 
21
+ * Low Rank Adaptation (LoRA)
22
+ * rank: 16
23
+ * alpha: 16
24
+ * dropout: 0.05
25
+ * target modules: q_proj, k_proj, v_proj, and o_proj
26
 
27
  This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model.
28
 
29
  ### Training hardware
30
 
31
  This model is trained on commodity hardware equipped with a:
32
+ * 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
33
+ * 64 GB installed RAM
34
+ * NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
35
 
36
+ The trained model consumed 100 GPU hours during training.