Tijmen2 commited on
Commit
c7e3ab1
1 Parent(s): ba71829

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -12
README.md CHANGED
@@ -10,6 +10,8 @@ language:
10
  - en
11
  pipeline_tag: text-generation
12
  base_model: mistralai/Mistral-7B-v0.1
 
 
13
  ---
14
 
15
  # cosmosage
@@ -55,18 +57,16 @@ textbooks, rather than just on synthetically generated QA pairs. However, it con
55
  _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
56
  (or any LLM) should not be trusted to be factual.
57
 
58
- ### Training hyperparameters
59
 
60
- The following hyperparameters were used during continued pretraining:
 
 
61
  - learning_rate: 1e-05
62
- - max_grad_norm: 3.0
63
  - train_batch_size: 4
64
- - eval_batch_size: 4
65
- - seed: 701
66
- - distributed_type: multi-GPU
67
  - num_devices: 4
68
  - total_train_batch_size: 16
69
- - total_eval_batch_size: 16
70
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
71
  - lr_scheduler_type: cosine
72
  - lr_scheduler_warmup_steps: 100
@@ -75,14 +75,10 @@ The following hyperparameters were used during continued pretraining:
75
 
76
  The following hyperparameters were used during QA tuning:
77
  - learning_rate: 2e-06
78
- - max_grad_norm: 3.0
79
  - train_batch_size: 4
80
- - eval_batch_size: 4
81
- - seed: 702
82
- - distributed_type: multi-GPU
83
  - num_devices: 4
84
  - total_train_batch_size: 16
85
- - total_eval_batch_size: 16
86
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
87
  - lr_scheduler_type: linear
88
  - lr_scheduler_warmup_steps: 100
 
10
  - en
11
  pipeline_tag: text-generation
12
  base_model: mistralai/Mistral-7B-v0.1
13
+ datasets:
14
+ - teknium/OpenHermes-2.5
15
  ---
16
 
17
  # cosmosage
 
57
  _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
58
  (or any LLM) should not be trusted to be factual.
59
 
60
+ ### Training details
61
 
62
+ cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
63
+
64
+ The following parameters were used during continued pretraining:
65
  - learning_rate: 1e-05
 
66
  - train_batch_size: 4
67
+ - max_grad_norm: 3.0
 
 
68
  - num_devices: 4
69
  - total_train_batch_size: 16
 
70
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
71
  - lr_scheduler_type: cosine
72
  - lr_scheduler_warmup_steps: 100
 
75
 
76
  The following hyperparameters were used during QA tuning:
77
  - learning_rate: 2e-06
 
78
  - train_batch_size: 4
79
+ - max_grad_norm: 3.0
 
 
80
  - num_devices: 4
81
  - total_train_batch_size: 16
 
82
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
83
  - lr_scheduler_type: linear
84
  - lr_scheduler_warmup_steps: 100