Tijmen2
/

cosmosage_v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cosmosage_v2 / README.md

Tijmen2's picture

Update README.md

c7e3ab1 verified 9 months ago

|

3.12 kB

	---
	tags:
	- physics
	- cosmology
	model-index:
	- name: cosmosage_qa
	results: []
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-7B-v0.1
	datasets:
	- teknium/OpenHermes-2.5
	---

	# cosmosage

	Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

	cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
	and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
	chat model, though it excels in Q&A mode, where the model gives a single answer in response to
	a single question.

	The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage

	## Usage

	After downloading cosmosage_v2, the following example code can be used to ask questions:

	```python
	path_to_model = 'cosmosage_v2/'

	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	device = "cuda"
	model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
	tokenizer = AutoTokenizer.from_pretrained(path_to_model)
	def ask_cosmosage(question):
	input_ids = torch.cat([
	tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
	tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
	], dim=-1).to(device)
	generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4)
	return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	```

	## Comparison to cosmosage_v1

	cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
	textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
	_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
	(or any LLM) should not be trusted to be factual.

	### Training details

	cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).

	The following parameters were used during continued pretraining:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- max_grad_norm: 3.0
	- num_devices: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3.0
	- weight_decay: 1e-04

	The following hyperparameters were used during QA tuning:
	- learning_rate: 2e-06
	- train_batch_size: 4
	- max_grad_norm: 3.0
	- num_devices: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2.0
	- weight_decay: 0.0