Update README.md
Browse files
README.md
CHANGED
@@ -30,27 +30,6 @@ During the pretraining phase of our large language model, the model was exposed
|
|
30 |
|
31 |
Our model was pretrained using a single A100 80GB GPU on the QBlocks platform. We chose bfloat16 as training precision due to stability issues with float16 precision.
|
32 |
|
33 |
-
We used Parameter efficient finetuning for pretraining, using Low Rank Adaptation (LoRA), to achieve a training loss of approximately 2.8 after training for almost 2 days.
|
34 |
-
|
35 |
-
```python
|
36 |
-
# LoRA config
|
37 |
-
|
38 |
-
peft:
|
39 |
-
r: 64
|
40 |
-
lora_alpha: 128
|
41 |
-
target_modules:
|
42 |
-
[
|
43 |
-
"q_proj", "v_proj",
|
44 |
-
"k_proj", "o_proj",
|
45 |
-
"gate_proj", "up_proj",
|
46 |
-
"down_proj",
|
47 |
-
]
|
48 |
-
lora_dropout: 0.05
|
49 |
-
bias: "none"
|
50 |
-
task_type: "CAUSAL_LM"
|
51 |
-
modules_to_save: ["embed_tokens", "lm_head"]
|
52 |
-
```
|
53 |
-
|
54 |
## License
|
55 |
|
56 |
The model inherits the license from meta-llama/Llama-2-7b.
|
|
|
30 |
|
31 |
Our model was pretrained using a single A100 80GB GPU on the QBlocks platform. We chose bfloat16 as training precision due to stability issues with float16 precision.
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## License
|
34 |
|
35 |
The model inherits the license from meta-llama/Llama-2-7b.
|