Tijmen2
/

cosmosage_v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Tijmen2 commited on Feb 19

Commit

c7e3ab1

•

1 Parent(s): ba71829

Update README.md

Files changed (1) hide show

README.md +8 -12

README.md CHANGED Viewed

@@ -10,6 +10,8 @@ language:
 - en
 pipeline_tag: text-generation
 base_model: mistralai/Mistral-7B-v0.1
 ---
 # cosmosage
@@ -55,18 +57,16 @@ textbooks, rather than just on synthetically generated QA pairs. However, it con
 _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
 (or any LLM) should not be trusted to be factual.
-### Training hyperparameters
-The following hyperparameters were used during continued pretraining:
 - learning_rate: 1e-05
-- max_grad_norm: 3.0
 - train_batch_size: 4
-- eval_batch_size: 4
-- seed: 701
-- distributed_type: multi-GPU
 - num_devices: 4
 - total_train_batch_size: 16
-- total_eval_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
@@ -75,14 +75,10 @@ The following hyperparameters were used during continued pretraining:
 The following hyperparameters were used during QA tuning:
 - learning_rate: 2e-06
-- max_grad_norm: 3.0
 - train_batch_size: 4
-- eval_batch_size: 4
-- seed: 702
-- distributed_type: multi-GPU
 - num_devices: 4
 - total_train_batch_size: 16
-- total_eval_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 100

 - en
 pipeline_tag: text-generation
 base_model: mistralai/Mistral-7B-v0.1
+datasets:
+- teknium/OpenHermes-2.5
 ---
 # cosmosage
 _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
 (or any LLM) should not be trusted to be factual.
+### Training details
+cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
+The following parameters were used during continued pretraining:
 - learning_rate: 1e-05
 - train_batch_size: 4
+- max_grad_norm: 3.0
 - num_devices: 4
 - total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
 The following hyperparameters were used during QA tuning:
 - learning_rate: 2e-06
 - train_batch_size: 4
+- max_grad_norm: 3.0
 - num_devices: 4
 - total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 100