cosmosage_v2 / README.md

Update README.md

f814fce verified 9 months ago

8.33 kB

	---
	tags:
	- physics
	- cosmology
	model-index:
	- name: cosmosage_qa
	results: []
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-7B-v0.1
	datasets:
	- teknium/OpenHermes-2.5
	---

	# cosmosage

	Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

	cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
	and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
	chat model, though it excels in Q&A mode, where the model gives a single answer in response to
	a single question.

	The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage

	## Usage

	After downloading cosmosage_v2, the following example code can be used to ask questions:

	```python
	path_to_model = 'cosmosage_v2/'

	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	device = "cuda"
	model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
	tokenizer = AutoTokenizer.from_pretrained(path_to_model)
	def ask_cosmosage(question):
	input_ids = torch.cat([
	tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
	tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
	], dim=-1).to(device)
	generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4)
	return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	```

	## Comparison to cosmosage_v1

	cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
	textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
	_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
	(or any LLM) should not be trusted to be factual.

	### Training details

	cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).

	The following parameters were used during continued pretraining:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- max_grad_norm: 3.0
	- num_devices: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3.0
	- weight_decay: 1e-04

	The following hyperparameters were used during QA tuning:
	- learning_rate: 2e-06
	- train_batch_size: 4
	- max_grad_norm: 3.0
	- num_devices: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2.0
	- weight_decay: 0.0

	### Examples

	User:
	> What is Digital Active Nulling?

	Assistant:
	> Digital Active Nulling (DAN) is a technique used to improve the performance of Transition Edge Sensor (TES) bolometers in millimeter-wave astrophysics. It nulls the current flowing through the SQUID input coil, which is used to read out the TES, by injecting a current at the SQUID input that is the negative of the current through the TES. This results in a significant reduction in the effective input impedance of the SQUID, which increases the dynamic range of the readout system and reduces non-linearity. In addition, DAN replaces the flux-locked loop (FLL), removing the wiring length and bandwidth limitations of the FLL. DAN is used in the Simons Array and SPT-3G experiments, and is the baseline readout technology for the upcoming LiteBIRD satellite.


	User:
	> Explain the ISW effect.

	Assistant:
	> 1. The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in cosmology that describes the contribution of gravitational potential fluctuations to the Cosmic Microwave Background (CMB) anisotropy. It is named after Rainer K. Sachs and Arthur M. Wolfe, who first described the effect in their 1967 paper.\n\n2. The ISW effect occurs when CMB photons travel through a time-varying gravitational potential due to the expansion of the Universe. As the photons pass through regions where the gravitational potential is changing, their energy is altered, leading to a change in their temperature.\n\n3. There are two main phases in which the ISW effect can occur: the early-time effect and the late-time effect.\n\n4. The early-time ISW effect, also known as the Rees-Sciama effect, occurs during the transition from the radiation-dominated era to the matter-dominated era in the early Universe. As the Universe expands and the density of radiation decreases, the gravitational potentials begin to decay, causing CMB photons to gain energy and contribute to the anisotropy.\n\n5. The late-time ISW effect occurs when the Universe transitions to the current dark energy-dominated era. As the Universe expands and the dark energy starts to dominate, the gravitational potentials decay again, causing CMB photons to gain energy and contribute to the anisotropy.\n\n6. The ISW effect is challenging to detect because it is a secondary anisotropy in the CMB, meaning it is superimposed on the primary anisotropies that formed during the early Universe. Additionally, the ISW effect has a similar angular scale as other CMB anisotropies, making it difficult to isolate from other sources.\n\n7. The detection of the ISW effect has been achieved through cross-correlating the CMB with large-scale structure (LSS) surveys, such as galaxy surveys. These cross-correlations can reveal the ISW signal by comparing the distribution of galaxies with the CMB anisotropies.\n\n8. The ISW effect is an important tool for studying the large-scale structure of the Universe, the properties of dark energy, and the evolution of cosmic structures. It also provides a way to test and constrain cosmological models, such as the ΛCDM model.


	User:
	> How does the time of matter-radiation equality affect the damping tail?

	Assistant:
	> The time of matter-radiation equality affects the damping tail by influencing the amount of damping that occurs in the CMB power spectrum. The damping tail is the region of the CMB power spectrum that is suppressed due to the diffusion of photons in the early universe. This diffusion is caused by the interaction of photons with free electrons, which scatter the photons and reduce their energy. The amount of damping that occurs is determined by the amount of time that the photons spend in this diffusive state, which is influenced by the time of matter-radiation equality. \n\nMatter-radiation equality is the point in the history of the universe when the energy density of matter becomes equal to the energy density of radiation. This occurs when the universe is approximately 50,000 years old. The time of matter-radiation equality affects the damping tail because it determines the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. \n\nIf matter-radiation equality occurs later in the history of the universe, then the photons spend more time in the diffusive state, which results in more damping of the CMB power spectrum. This is because the longer the photons spend in the diffusive state, the more time they have to lose energy through scattering with free electrons. Conversely, if matter-radiation equality occurs earlier in the history of the universe, then the photons spend less time in the diffusive state, which results in less damping of the CMB power spectrum. \n\nIn summary, the time of matter-radiation equality affects the damping tail of the CMB power spectrum by influencing the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. The longer the photons spend in the diffusive state, the more damping occurs, and the earlier matter-radiation equality occurs, the less damping occurs.>