Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,8 @@ language:
|
|
10 |
- en
|
11 |
pipeline_tag: text-generation
|
12 |
base_model: mistralai/Mistral-7B-v0.1
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# cosmosage
|
@@ -55,18 +57,16 @@ textbooks, rather than just on synthetically generated QA pairs. However, it con
|
|
55 |
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
|
56 |
(or any LLM) should not be trusted to be factual.
|
57 |
|
58 |
-
### Training
|
59 |
|
60 |
-
|
|
|
|
|
61 |
- learning_rate: 1e-05
|
62 |
-
- max_grad_norm: 3.0
|
63 |
- train_batch_size: 4
|
64 |
-
-
|
65 |
-
- seed: 701
|
66 |
-
- distributed_type: multi-GPU
|
67 |
- num_devices: 4
|
68 |
- total_train_batch_size: 16
|
69 |
-
- total_eval_batch_size: 16
|
70 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
71 |
- lr_scheduler_type: cosine
|
72 |
- lr_scheduler_warmup_steps: 100
|
@@ -75,14 +75,10 @@ The following hyperparameters were used during continued pretraining:
|
|
75 |
|
76 |
The following hyperparameters were used during QA tuning:
|
77 |
- learning_rate: 2e-06
|
78 |
-
- max_grad_norm: 3.0
|
79 |
- train_batch_size: 4
|
80 |
-
-
|
81 |
-
- seed: 702
|
82 |
-
- distributed_type: multi-GPU
|
83 |
- num_devices: 4
|
84 |
- total_train_batch_size: 16
|
85 |
-
- total_eval_batch_size: 16
|
86 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
87 |
- lr_scheduler_type: linear
|
88 |
- lr_scheduler_warmup_steps: 100
|
|
|
10 |
- en
|
11 |
pipeline_tag: text-generation
|
12 |
base_model: mistralai/Mistral-7B-v0.1
|
13 |
+
datasets:
|
14 |
+
- teknium/OpenHermes-2.5
|
15 |
---
|
16 |
|
17 |
# cosmosage
|
|
|
57 |
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
|
58 |
(or any LLM) should not be trusted to be factual.
|
59 |
|
60 |
+
### Training details
|
61 |
|
62 |
+
cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
|
63 |
+
|
64 |
+
The following parameters were used during continued pretraining:
|
65 |
- learning_rate: 1e-05
|
|
|
66 |
- train_batch_size: 4
|
67 |
+
- max_grad_norm: 3.0
|
|
|
|
|
68 |
- num_devices: 4
|
69 |
- total_train_batch_size: 16
|
|
|
70 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
71 |
- lr_scheduler_type: cosine
|
72 |
- lr_scheduler_warmup_steps: 100
|
|
|
75 |
|
76 |
The following hyperparameters were used during QA tuning:
|
77 |
- learning_rate: 2e-06
|
|
|
78 |
- train_batch_size: 4
|
79 |
+
- max_grad_norm: 3.0
|
|
|
|
|
80 |
- num_devices: 4
|
81 |
- total_train_batch_size: 16
|
|
|
82 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
83 |
- lr_scheduler_type: linear
|
84 |
- lr_scheduler_warmup_steps: 100
|