File size: 3,115 Bytes
22db95e 2f7faaa 05d984a 2f7faaa 05d984a 2f7faaa 05d984a c7e3ab1 22db95e 2f7faaa 05d984a 75691ae 05d984a ba71829 8ac347d 05d984a 2f7faaa c7e3ab1 2f7faaa c7e3ab1 991a2f2 c7e3ab1 991a2f2 2f7faaa ad6c9ce 2f7faaa c7e3ab1 2f7faaa 991a2f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
tags:
- physics
- cosmology
model-index:
- name: cosmosage_qa
results: []
license: mit
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
datasets:
- teknium/OpenHermes-2.5
---
# cosmosage
Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
chat model, though it excels in Q&A mode, where the model gives a single answer in response to
a single question.
The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage
## Usage
After downloading cosmosage_v2, the following example code can be used to ask questions:
```python
path_to_model = 'cosmosage_v2/'
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
def ask_cosmosage(question):
input_ids = torch.cat([
tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
torch.tensor([[28705]]),
tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
torch.tensor([[28705]]),
tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
], dim=-1).to(device)
generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4)
return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
```
## Comparison to cosmosage_v1
cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
(or any LLM) should not be trusted to be factual.
### Training details
cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
The following parameters were used during continued pretraining:
- learning_rate: 1e-05
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0
- weight_decay: 1e-04
The following hyperparameters were used during QA tuning:
- learning_rate: 2e-06
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 2.0
- weight_decay: 0.0 |