adriantheuma
/

raven-lora

Model card Files Files and versions Community

adriantheuma commited on Jan 25

Commit

05c26a8

•

1 Parent(s): 68caf14

Update README.md

Files changed (1) hide show

README.md +23 -18

README.md CHANGED Viewed

@@ -1,31 +1,36 @@
 ---
 library_name: peft
 ---
 ### Training details
-- Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer).
-- The maximum context length is limited to 1,204.
-- Per device train batch: 1
-- Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
-- Quantisation: 8-bit (
-- Optimiser: adamw
-- Learning_rate: 3 × 10−4
-- warmup_steps: 100
-- epochs: 5
-- Low Rank Adaptation (LoRA)
-  - rank: 16
-  - alpha: 16
-  - dropout: 0.05
-  - target modules:  q_proj, k_proj, v_proj, and o_proj
 This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model.
 ### Training hardware
 This model is trained on commodity hardware equipped with a:
-- 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
-- 64 GB installed RAM
-- NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
-The trained model consumed 100 GPU hours during training.

 ---
 library_name: peft
+license: apache-2.0
+datasets:
+- adriantheuma/raven-data
+language:
+- en
 ---
 ### Training details
+* Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer).
+* Maximum context length: 1,204 tokens
+* Per device train batch: 1
+* Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
+* Quantisation: 8-bit
+* Optimiser: adamw
+* Learning_rate: 3 × 10−4
+* warmup_steps: 100
+* epochs: 5
+* Low Rank Adaptation (LoRA)
+  * rank: 16
+  * alpha: 16
+  * dropout: 0.05
+  * target modules:  q_proj, k_proj, v_proj, and o_proj
 This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model.
 ### Training hardware
 This model is trained on commodity hardware equipped with a:
+* 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
+* 64 GB installed RAM
+* NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
+The trained model consumed 100 GPU hours during training.