File size: 5,064 Bytes
4516143 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
language:
- en
pipeline_tag: text-generation
tags:
- llama-3.1
- astronomy
- astrophysics
- cosmology
- arxiv
inference: false
base_model:
- meta-llama/Meta-Llama-3.1-8B
---
# AstroSage-Llama-3.1-8B
https://arxiv.org/abs/2411.09012
AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates excellent proficiency on a wide range of questions. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models.
## Model Details
- **Base Architecture**: Meta-Llama-3.1-8B
- **Base Model**: Meta-Llama-3.1-8B
- **Parameters**: 8 billion
- **Training Focus**: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
- **License**: Llama 3.1 Community License
- **Development Process**:
1. Continued Pre-training (CPT) on astronomical literature
2. Supervised Fine-tuning (SFT) on QA pairs and instruction sets
3. Model merging with Meta-Llama-3.1-8B-Instruct (75% CPT+SFT / 25% Meta-Instruct)
## Using the model
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("AstroMLab/AstroSage-8b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("AstroMLab/AstroSage-8b")
# Function to generate a response
def generate_response(prompt):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
response = outputs[0][inputs['input_ids'].shape[-1]:]
decoded = tokenizer.decode(response, skip_special_tokens=True)
return decoded
# Example usage
prompt = """
You are an expert in general astrophysics. Your task is to answer the following question:
What are the main components of a galaxy?
"""
response = generate_response(prompt)
print(response)
```
## Model Improvements and Performance
AstroSage-Llama-3.1-8B shows remarkable performance improvements:
| Model | Score (%) |
|-------|-----------|
| **<span style="color:green">AstroSage-Llama-3.1-8B</span>** | **<span style="color:green">80.9</span>** |
| GPT-4o | 80.4 |
| LLaMA-3.1-8B | 73.7 |
| Gemma-2-9B | 71.5 |
| Qwen-2.5-7B | 70.4 |
| Yi-1.5-9B | 68.4 |
| InternLM-2.5-7B | 64.5 |
| Mistral-7B-v0.3 | 63.9 |
| ChatGLM3-6B | 50.4 |
The model demonstrates:
- Outperformance of all 8B parameter models
- Comparable performance to GPT-4o (80.4%)
- ~1000x more cost-effective than proprietary models
- 7 percentage-point improvement over base Llama-3.1-8b model
## Training Data
- **Continued Pre-training**:
- ~250,000 arXiv preprints (2007-2024) from astro-ph and gr-qc
- Astronomy-related Wikipedia articles
- Selected astronomy textbooks
- Total: 3.3 billion tokens, 19.9 GB plaintext
- **Supervised Fine-tuning**:
- 8.8 million curated QA pairs
- Filtered Infinity-Instruct-7M dataset
- Paper summaries and metadata
- Total: 2.0 billion tokens, 9.8 GB plaintext
## Intended Use
- Curiosity-driven question answering
- Brainstorming new ideas
- Astronomical research assistance
- Educational support in astronomy
- Literature review and summarization
- Scientific explanation of concepts
## Limitations
- Training data cutoff: January 2024
- As with all LLMs, hallucinations are possible
- Limited by 8B parameter size for complex reasoning
- Paper metadata not perfectly memorized
- Performance primarily validated on multiple-choice questions
- Primarily trained for use in English
## Technical Specifications
- Architecture: Based on Meta-Llama 3.1
- Training Infrastructure: ORNL OLCF Frontier
- Hosting: Hugging Face Hub (AstroMLab/AstroSage-8B)
## Ethical Considerations
While this model is designed for scientific use:
- Should not be used as sole source for critical research decisions
- Output should be verified against primary sources
- May reflect biases present in astronomical literature
## Citation and Contact
- Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
- AstroMLab: astromachinelearninglab at gmail dot com
- Please cite the AstroMLab 3 paper when referencing this model:
```
@preprint{dehaan2024astromlab3,
title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model},
author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
year={2024},
eprint={2411.09012},
archivePrefix={arXiv},
primaryClass={astro-ph.IM},
url={https://arxiv.org/abs/2411.09012},
}
``` |