Tijmen2 commited on
Commit
396f8e9
1 Parent(s): 98a9e82

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +65 -146
  2. config.json +1 -1
  3. pytorch_model.bin +1 -1
README.md CHANGED
@@ -1,156 +1,75 @@
1
  ---
2
  tags:
3
- - generated_from_trainer
 
4
  model-index:
5
- - name: home/005/th5351/output
6
  results: []
 
 
 
 
 
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
 
12
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
13
- <details><summary>See axolotl config</summary>
14
 
15
- axolotl version: `0.4.1`
16
- ```yaml
17
- base_model: /home/005/th5351/models/cosmosage-llama3-8b-base/
18
- model_type: LlamaForCausalLM
19
- tokenizer_type: AutoTokenizer
20
 
21
- load_in_8bit: false
22
- load_in_4bit: false
23
- strict: false
24
 
25
- chat_template: llama3
26
- datasets:
27
- - path: /home/005/th5351/datasets/combined_sft.jsonl
28
- type: chat_template
29
- chat_template: llama3
30
- field_messages: conversations
31
- message_field_role: from
32
- message_field_content: value
33
- roles:
34
- system:
35
- - system
36
- user:
37
- - human
38
- assistant:
39
- - gpt
40
-
41
-
42
- dataset_prepared_path: /home/005/th5351/output/last_run_prepared
43
- val_set_size: 0.001
44
- eval_sample_packing: false
45
- output_dir: /home/005/th5351/output
46
-
47
- sequence_len: 8192
48
- sample_packing: true
49
- pad_to_sequence_len: true
50
-
51
- wandb_project:
52
- wandb_entity:
53
- wandb_watch:
54
- wandb_name:
55
- wandb_log_model:
56
-
57
- gradient_accumulation_steps: 8
58
- micro_batch_size: 1
59
- num_epochs: 2
60
- optimizer: paged_adamw_8bit
61
- lr_scheduler: cosine
62
- learning_rate: 5e-5
63
- cosine_min_lr_ratio: 0.2
64
- cosine_constant_lr_ratio: 0.8
65
- max_grad_norm: 3.0
66
-
67
- seed: 42
68
-
69
- train_on_inputs: false
70
- group_by_length: false
71
- bf16: auto
72
- fp16:
73
- tf32: false
74
-
75
- gradient_checkpointing: true
76
- early_stopping_patience:
77
- resume_from_checkpoint:
78
- logging_steps: 1
79
- xformers_attention:
80
- flash_attention: true
81
-
82
- warmup_steps: 100
83
- evals_per_epoch: 5
84
- eval_table_size:
85
- saves_per_epoch: 1
86
- debug:
87
- deepspeed: /home/005/th5351/packages/axolotl/deepspeed_configs/zero2.json
88
- ddp_timeout: 3600000
89
- weight_decay: 0.0
90
- fsdp:
91
- fsdp_config:
92
-
93
- ```
94
-
95
- </details><br>
96
-
97
- # home/005/th5351/output
98
-
99
- This model was trained from scratch on the None dataset.
100
- It achieves the following results on the evaluation set:
101
- - Loss: nan
102
-
103
- ## Model description
104
-
105
- More information needed
106
-
107
- ## Intended uses & limitations
108
-
109
- More information needed
110
-
111
- ## Training and evaluation data
112
-
113
- More information needed
114
-
115
- ## Training procedure
116
-
117
- ### Training hyperparameters
118
-
119
- The following hyperparameters were used during training:
120
- - learning_rate: 5e-05
121
- - train_batch_size: 1
122
- - eval_batch_size: 1
123
- - seed: 42
124
- - distributed_type: multi-GPU
125
- - num_devices: 4
126
- - gradient_accumulation_steps: 8
127
- - total_train_batch_size: 32
128
- - total_eval_batch_size: 4
129
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
- - lr_scheduler_type: cosine
131
- - lr_scheduler_warmup_steps: 100
132
- - num_epochs: 2
133
-
134
- ### Training results
135
-
136
- | Training Loss | Epoch | Step | Validation Loss |
137
- |:-------------:|:------:|:----:|:---------------:|
138
- | 1.3757 | 0.0005 | 1 | nan |
139
- | 0.8083 | 0.1999 | 388 | nan |
140
- | 0.8005 | 0.3998 | 776 | nan |
141
- | 0.7389 | 0.5998 | 1164 | nan |
142
- | 0.7269 | 0.7997 | 1552 | nan |
143
- | 0.7069 | 0.9996 | 1940 | nan |
144
- | 0.5786 | 1.1613 | 2328 | nan |
145
- | 0.5385 | 1.3613 | 2716 | nan |
146
- | 0.5381 | 1.5612 | 3104 | nan |
147
- | 0.5273 | 1.7611 | 3492 | nan |
148
- | 0.527 | 1.9610 | 3880 | nan |
149
-
150
-
151
- ### Framework versions
152
-
153
- - Transformers 4.41.1
154
- - Pytorch 2.3.0+cu121
155
- - Datasets 2.19.1
156
- - Tokenizers 0.19.1
 
1
  ---
2
  tags:
3
+ - physics
4
+ - cosmology
5
  model-index:
6
+ - name: cosmosage-v3
7
  results: []
8
+ license: mit
9
+ language:
10
+ - en
11
+ pipeline_tag: text-generation
12
+ base_model: meta-llama/meta-llama-3-8b
13
+ datasets:
14
+ - teknium/OpenHermes-2.5
15
  ---
16
 
17
+ # cosmosage
 
18
 
19
+ cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
 
20
 
21
+ cosmosage-v3 is the latest iteration in the cosmosage series, trained on the LLAMA-3-8B base
22
+ model. We started with continued pretraining on thousands of papers and textbooks. The next step
23
+ was fine-tuning on synthetically-generated question-answer pairs. In addition, the OpenHermes 2.5
24
+ dataset was used to improve instruction following and general conversational capability.
 
25
 
26
+ cosmosage-v3 is a full chat model, though it excels in Q&A mode, where the model gives a single
27
+ answer in response to a single question.
 
28
 
29
+ The code used to generate cosmosage is available at https://github.com/tijmen/cosmosage
30
+
31
+ ## Usage
32
+
33
+ cosmosage-v3 uses the Llama-3 prompt template. Sampling parameters are up to you, but I like {'temperature': 0.7, 'smoothing_factor': 1, 'smoothing_curve': 1.5, 'repetition_penalty': 1.1}.
34
+
35
+ ## Comparison to cosmosage_v2
36
+
37
+ cosmosage-v3 was trained on the same data with nearly the same hyperparameters as cosmosage_v2.
38
+ The primary difference is the use of the LLAMA-3-8B base model rather than the Mistral-v0.1 base
39
+ model.
40
+
41
+ ## Training details
42
+
43
+ cosmosage-v3 was trained on 4xA100 (40 GB) at Gadi (NCI, Australia). A big thanks goes out to Yuan-Seng Ting for providing these resources.
44
+
45
+ ## Example output
46
+
47
+ **User:**
48
+ > What is "stacking"?
49
+
50
+ **Assistant:**
51
+ > Stacking is a technique used in astronomy to combine multiple images or data sets of the same object into a single image or dataset. This can be done by aligning and co-adding the individual images or data points, which helps to improve the signal-to-noise ratio and reveal faint features that may not be visible in a single image or dataset. Stacking is often used to study the properties of galaxies, galaxy clusters, and other astronomical objects.
52
+
53
+ **User:**
54
+ > Explain the ISW effect.
55
+
56
+ **Assistant:**
57
+ > The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in modern cosmology that describes the gravitational interaction between cosmic microwave background (CMB) photons and large-scale structures in the universe. It occurs when CMB photons pass through regions of space where the gravitational potential has changed due to the expansion of the universe or the presence of matter overdensities.
58
+
59
+ When a photon moves into an area with a higher gravitational potential, it gains energy as it falls into the potential well. As the photon climbs out of this well on the other side, it loses energy. If the change in the gravitational potential is gradual over time, the photon's energy gain and loss will be integrated, leading to a net temperature shift in the CMB radiation. This effect can either cause a blueshift (increased energy) if the gravitational potential is decreasing, or a redshift (decreased energy) if the gravitational potential is increasing.
60
+
61
+ The ISW effect is particularly important for understanding the evolution of the universe and the role of dark energy. The detection of the ISW effect has provided evidence for the existence of dark energy and has helped constrain its properties. By analyzing the CMB temperature fluctuations and cross-correlating them with tracers of large-scale structure such as galaxy surveys, researchers can study the ISW effect and learn more about the dynamics of the universe.
62
+
63
+ **User:**
64
+ > What is Digital Active Nulling?
65
+
66
+ **Assistant:**
67
+ > Digital Active Nulling (DAN) is a technique used in the readout of Transition Edge Sensor (TES) bolometers. It works by dynamically nulling the current flowing through the SQUID input coil while digitally demodulating the carrier voltage and the current signal. This method helps to minimize non-linearity effects due to the nonlinear response of the SQUID, improving the overall performance of the system.
68
+
69
+ **User:**
70
+ > Explain why the time of matter-radiation equality affects the CMB damping tail.
71
+
72
+ **Assistant:**
73
+ > The time of matter-radiation equality is when the energy densities of matter and radiation were equal. This has an impact on the Cosmic Microwave Background (CMB) damping tail because it determines the amount of photon diffusion that occurs before decoupling. A later time of matter-radiation equality results in more photon diffusion, leading to a smaller amplitude in the damping tail.
74
+
75
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/home/005/th5351/models/cosmosage-llama3-8b-base/",
3
  "architectures": [
4
  "LlamaForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "/home/005/th5351/models/cosmosage-llama3-8b/",
3
  "architectures": [
4
  "LlamaForCausalLM"
5
  ],
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0db3e5f497f3fa124ca527820461bf9624c4503bf87935b7a2bf36d4befab67a
3
  size 16060563132
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d8d275ba30cd97e1bf6280c8ceacf4bf23d647760b7258d55a16642a14d9a8c
3
  size 16060563132