AstroLLaMA-2-70B-Base_AIC

AstroLLaMA-2-70B-Base_AIC is a specialized base language model for astronomy, developed by fine-tuning Meta's LLaMA-2-70b architecture on astronomical literature. This model was developed by the AstroMLab team and is, to our best knowledge, the first specialized 70B parameter-level LLM in astronomy. It is designed for next token prediction tasks and is not an instruct/chat model.

Model Details

Base Architecture: LLaMA-2-70b
Training Data: Abstract, Introduction, and Conclusion (AIC) sections from arXiv's astro-ph category papers (from arXiv's inception up to July 2023)
Data Processing: The training data was derived from LaTeX source files using regex-based extraction methods to identify and extract the relevant sections (Abstract, Introduction, and Conclusion).
Fine-tuning Method: Continual Pre-Training (CPT) using the LMFlow framework
Training Details:
- Learning rate: 2 × 10⁻⁵
- Total batch size: 160
- Maximum token length: 2048
- Warmup ratio: 0.03
- Cosine decay schedule for learning rate reduction
- Training duration: 1 epoch (approximately 2,000 A100 GPU hours)
Primary Use: Next token prediction for astronomy-related text generation and analysis
Reference: Pan et al. 2024

Generating text from a prompt

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("AstroMLab/astrollama-2-70b-base_aic")
model = AutoModelForCausalLM.from_pretrained("AstroMLab/astrollama-2-70b-base_aic", device_map="auto")

# Create the pipeline with explicit truncation
from transformers import pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    truncation=True,
    max_length=512
)

# Example prompt from an astronomy paper
prompt = "In this letter, we report the discovery of the highest redshift, " \
    "heavily obscured, radio-loud QSO candidate selected using JWST NIRCam/MIRI, " \
    "mid-IR, sub-mm, and radio imaging in the COSMOS-Web field. "

# Set seed for reproducibility
torch.manual_seed(42)

# Generate text
generated_text = generator(prompt, do_sample=True)
print(generated_text[0]['generated_text'])

Model Performance and Significance

AstroLLaMA-2-70B-Base_AIC demonstrates notable improvements over its baseline LLaMA-2-70B model, marking a crucial step in specialized astronomical LLMs. Here's a performance comparison chart based upon the astronomical benchmarking Q&A as described in Ting et al. 2024:

Model	Score (%)
AstroSage-LLaMA-3.1-8B (AstroMLab)	80.9
AstroLLaMA-2-70B-Base (AstroMLab)	76.0
LLaMA-3.1-8B	73.7
LLaMA-2-70B	70.7
Gemma-2-9B	71.5
Qwen-2.5-7B	70.4
Yi-1.5-9B	68.4
InternLM-2.5-7B	64.5
Mistral-7B-v0.3	63.9
ChatGLM3-6B	50.4

It demonstrates that training specialized LLMs can be effective, especially at larger model scales.

Ethical Considerations

While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications.

Citation

If you use this model in your research, please cite:

@ARTICLE{2024arXiv240919750P,
       author = {{Pan}, Rui and {Dung Nguyen}, Tuan and {Arora}, Hardik and {Accomazzi}, Alberto and {Ghosal}, Tirthankar and {Ting}, Yuan-Sen},
        title = "{AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy}",
      journal = {arXiv e-prints},
     keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Computation and Language},
         year = 2024,
        month = sep,
          eid = {arXiv:2409.19750},
        pages = {arXiv:2409.19750},
          doi = {10.48550/arXiv.2409.19750},
archivePrefix = {arXiv},
       eprint = {2409.19750},
 primaryClass = {astro-ph.IM},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240919750P},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}