File size: 11,251 Bytes
22db95e 2f7faaa 05d984a 2f7faaa 05d984a 2f7faaa 05d984a c7e3ab1 22db95e 2f7faaa 05d984a 75691ae fbadf1a 05d984a fbadf1a 8ac347d 05d984a 2f7faaa de9a5f8 2f7faaa c7e3ab1 991a2f2 c7e3ab1 991a2f2 2f7faaa ad6c9ce 2f7faaa c7e3ab1 2f7faaa 991a2f2 e43558a 91e8a08 de9a5f8 e43558a 1bf5f7f 4506b30 1bf5f7f 4506b30 1bf5f7f f814fce 1bf5f7f f814fce f956cfe f814fce bb103b3 f814fce f956cfe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
tags:
- physics
- cosmology
model-index:
- name: cosmosage_qa
results: []
license: mit
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
datasets:
- teknium/OpenHermes-2.5
---
# cosmosage
Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
chat model, though it excels in Q&A mode, where the model gives a single answer in response to
a single question.
The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage
## Usage
After downloading cosmosage_v2, the following example code can be used to ask questions:
```python
model_path = "models/cosmosage_v2/"
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda")
model = AutoModelForCausalLM.from_pretrained(model_path).to(device, dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_path)
def ask_cosmosage(question, answer_start=''):
prompt = f"You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.USER: {question}ASSISTANT: {answer_start}"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
generated_ids = model.generate(input_ids, max_length=1024, do_sample=True, temperature=0.7, top_k=None, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
answer = generated_text.split("ASSISTANT:")[-1]
return answer
```
## Comparison to cosmosage_v1
cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
(or any LLM) should not be trusted to be factual.
## Training details
cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
The following parameters were used during continued pretraining:
- learning_rate: 1e-05
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0
- weight_decay: 1e-04
The following hyperparameters were used during QA tuning:
- learning_rate: 2e-06
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 2.0
- weight_decay: 0.0
## Versions
This repository contains:
- pytorch_model.bin: standard version (bfloat16)
- model.safetensors: same as pytorch_mode.bin but in safetensors format
- gptq_model-8bit-128g.safetensors: 8-bit quantized version for inference speedup and low-VRAM GPUs
- gptq_model-4bit-128g.safetensors: 4-bit quantized version for even faster inference, lower VRAM requirements, lower quality
When using one of the quantized versions, make sure to pass the quantization configuration:
```json
{
"bits": <4 or 8 depending on the version>,
"group_size": 128,
"damp_percent": 0.01,
"desc_act": false,
"static_groups": false,
"sym": true,
"true_sequential": true,
"model_name_or_path": null,
"model_file_base_name": null
}
```
## Example output
**User:**
> What is Digital Active Nulling?
**Assistant:**
> Digital Active Nulling (DAN) is a technique used to improve the performance of Transition Edge Sensor (TES) bolometers in millimeter-wave astrophysics. It nulls the current flowing through the SQUID input coil, which is used to read out the TES, by injecting a current at the SQUID input that is the negative of the current through the TES. This results in a significant reduction in the effective input impedance of the SQUID, which increases the dynamic range of the readout system and reduces non-linearity. In addition, DAN replaces the flux-locked loop (FLL), removing the wiring length and bandwidth limitations of the FLL. DAN is used in the Simons Array and SPT-3G experiments, and is the baseline readout technology for the upcoming LiteBIRD satellite.
**User:**
> Explain the ISW effect.
**Assistant:**
> 1. The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in cosmology that describes the contribution of gravitational potential fluctuations to the Cosmic Microwave Background (CMB) anisotropy. It is named after Rainer K. Sachs and Arthur M. Wolfe, who first described the effect in their 1967 paper.
> 2. The ISW effect occurs when CMB photons travel through a time-varying gravitational potential due to the expansion of the Universe. As the photons pass through regions where the gravitational potential is changing, their energy is altered, leading to a change in their temperature.
> 3. There are two main phases in which the ISW effect can occur: the early-time effect and the late-time effect.
> 4. The early-time ISW effect, also known as the Rees-Sciama effect, occurs during the transition from the radiation-dominated era to the matter-dominated era in the early Universe. As the Universe expands and the density of radiation decreases, the gravitational potentials begin to decay, causing CMB photons to gain energy and contribute to the anisotropy.
> 5. The late-time ISW effect occurs when the Universe transitions to the current dark energy-dominated era. As the Universe expands and the dark energy starts to dominate, the gravitational potentials decay again, causing CMB photons to gain energy and contribute to the anisotropy.
> 6. The ISW effect is challenging to detect because it is a secondary anisotropy in the CMB, meaning it is superimposed on the primary anisotropies that formed during the early Universe. Additionally, the ISW effect has a similar angular scale as other CMB anisotropies, making it difficult to isolate from other sources.
> 7. The detection of the ISW effect has been achieved through cross-correlating the CMB with large-scale structure (LSS) surveys, such as galaxy surveys. These cross-correlations can reveal the ISW signal by comparing the distribution of galaxies with the CMB anisotropies.
> 8. The ISW effect is an important tool for studying the large-scale structure of the Universe, the properties of dark energy, and the evolution of cosmic structures. It also provides a way to test and constrain cosmological models, such as the ΞCDM model.
**User:**
> Explain how one would calculate the helium fraction at the surface of last scattering.
**Assistant:**
> 1. Determine the redshift of the surface of last scattering (zLS) using the CMB temperature of 2.7255 K and the current temperature of the universe T0 = 2.7255 K. The surface of last scattering is approximately 1380 K, which corresponds to a redshift of zLS = 1089.9.
> 2. Calculate the scale factor at the surface of last scattering (aLS) using the redshift. The scale factor at the surface of last scattering is approximately aLS = 1/(1 + zLS) = 1/(1 + 1089.9) β 9.143 Γ 10^-4.
> 3. Determine the Hubble parameter at the surface of last scattering (HLS). The Hubble parameter is related to the Hubble constant (H0) and the scale factor (a) by the equation H(z) = H0 * a^(-3/2). Since the radiation density is much larger than the matter density at the surface of last scattering, we can use the following approximation for the Hubble parameter: H(z) β H0 * sqrt(1 + z) β H0 * sqrt(1 + 1089.9) β 1.06 Γ 10^4 * H0.
> 4. Calculate the age of the universe at the surface of last scattering (tLS). The age of the universe can be calculated using the Hubble parameter and the scale factor: t = 1/H. At the surface of last scattering, the age of the universe is approximately tLS β 1/(1.06 Γ 10^4 * H0) β 9.4 Γ 10^-10 * H0.
> 5. Determine the neutron-to-proton ratio at the surface of last scattering (n/p ratio). The neutron-to-proton ratio can be calculated using the equation n/p = exp(-Q/kT), where Q is the neutron-proton mass difference (1.293 MeV), k is the Boltzmann constant, and T is the temperature at the surface of last scattering. At the surface of last scattering, the neutron-to-proton ratio is approximately n/p β exp(-1.293/0.801) β 0.198.
> 6. Calculate the helium fraction (Yp) at the surface of last scattering (YpLS). The helium fraction can be calculated using the following equation: Yp = 2 * n/p, where n is the number of neutrons and p is the number of protons. At the surface of last scattering, the helium fraction is approximately YpLS β 2 * 0.198 β 0.396.
> 7. Determine the helium fraction at the surface of last scattering (YpLS). The helium fraction at the surface of last scattering is approximately YpLS β 0.396.
**User:**
> How does the time of matter-radiation equality affect the damping tail?
**Assistant:**
> The time of matter-radiation equality affects the damping tail by influencing the amount of damping that occurs in the CMB power spectrum. The damping tail is the region of the CMB power spectrum that is suppressed due to the diffusion of photons in the early universe. This diffusion is caused by the interaction of photons with free electrons, which scatter the photons and reduce their energy. The amount of damping that occurs is determined by the amount of time that the photons spend in this diffusive state, which is influenced by the time of matter-radiation equality.
>
> Matter-radiation equality is the point in the history of the universe when the energy density of matter becomes equal to the energy density of radiation. This occurs when the universe is approximately 50,000 years old. The time of matter-radiation equality affects the damping tail because it determines the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe.
>
> If matter-radiation equality occurs later in the history of the universe, then the photons spend more time in the diffusive state, which results in more damping of the CMB power spectrum. This is because the longer the photons spend in the diffusive state, the more time they have to lose energy through scattering with free electrons. Conversely, if matter-radiation equality occurs earlier in the history of the universe, then the photons spend less time in the diffusive state, which results in less damping of the CMB power spectrum.
>
> In summary, the time of matter-radiation equality affects the damping tail of the CMB power spectrum by influencing the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. The longer the photons spend in the diffusive state, the more damping occurs, and the earlier matter-radiation equality occurs, the less damping occurs.>
|