metadata

language:
  - en
tags:
  - physics
  - astronomy
  - astrophysics
  - cosmology
license:
  - llama3.1
base_model:
  - meta-llama/Meta-Llama-3.1-8B
library_name: transformers

AstroSage-Llama-3.1-8B

AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates excellent proficiency on a wide range of questions. AstroSage-Llama-3.1-8B scores 80.9% on the AstroMLab-1 benchmark, greatly outperforming all models---proprietary and open-weight---in the 8-billion parameter class, and performing on par with GPT-4o. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models. AstroSage-Llama-3.1-8B is freely available, enabling widespread access to advanced AI capabilities for astronomical education and research.

Model Details

Model Type: Domain-specialized LLM
Base Model: Meta-Llama-3.1-8B
Parameters: 8 billion
Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
License: Llama 3.1 Community License
Development Process:
1. Continued Pre-training (CPT) on astronomical literature
2. Supervised Fine-tuning (SFT) on QA pairs and instruction sets
3. Model merging with Meta-Llama-3.1-8B-Instruct (75% CPT+SFT / 25% Meta-Instruct)

Performance

AstroMLab-1 Benchmark: 80.9% accuracy
- Outperforms all 8B parameter models
- Comparable to GPT-4o (80.4%)
- ~1000x more cost-effective than proprietary models
- 8 percentage-point improvement over base Llama-3.1-8b model on Astronomy Q&A benchmark

General Capabilities: Maintains strong performance on standard benchmarks
- IF-EVAL: 41.4%
- BBH: 52.9%
- MATH: 8.4%
- GPQA: 31.2%
- MUSR: 38.9%
- MMLU-PRO: 34.6%

Training Data

Continued Pre-training:
- ~250,000 arXiv preprints (2007-2024) from astro-ph and gr-qc
- Astronomy-related Wikipedia articles
- Selected astronomy textbooks
- Total: 3.3 billion tokens, 19.9 GB plaintext
Supervised Fine-tuning:
- 8.8 million curated QA pairs
- Filtered Infinity-Instruct-7M dataset
- Paper summaries and metadata
- Total: 2.0 billion tokens, 9.8 GB plaintext

Intended Use

Curiosity-driven question answering
Brainstorming new ideas
Astronomical research assistance
Educational support in astronomy
Literature review and summarization
Scientific explanation of concepts

Limitations

Training data cutoff: January 2024
As with all LLMs, hallucinations are possible
Limited by 8B parameter size for complex reasoning
Paper metadata not perfectly memorized
Performance primarily validated on multiple-choice questions
Primarily trained for use in English

Ethical Considerations

Should not be used as sole source for critical research decisions
Output should be verified against primary sources
May reflect biases present in astronomical literature

Technical Specifications

Architecture: Based on Meta-Llama 3.1
Training Infrastructure: ORNL OLCF Frontier
Hosting: Hugging Face Hub (AstroMLab/AstroSage-8B)

Citation and Contact

Contract: Corresponding author Tijmen de Haan, email: tijmen dot dehaan at gmail dot com and AstroMLab astromachinelearninglab at gmail dot com
Please cite the AstroMLab 3 paper when referencing to this model.