Model Card for nasa-smd-ibm-v0.1

nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.

Model Details

Training Data

  • Wikipedia English (Feb 1, 2020)
  • AGU Publications
  • AMS Publications
  • Scientific papers from Astrophysics Data Systems (ADS)
  • PubMed abstracts
  • PubMedCentral (PMC) (commercial license subset)


Training Procedure

  • Framework: fairseq 0.12.1 with PyTorch 1.9.1
  • transformers Version: 4.2.0
  • Strategy: Masked Language Modeling (MLM)


  • BLURB Benchmark
  • Pruned SQuAD2.0 (SQ2) Benchmark (Amazon Rainforest, Oxygen, Geology and NASA ES QAs)
  • NASA SMD Expert QA Benchmark (WIP)


Pruned SQ2 Benchmark


  • Named Entity Recognition (NER)
  • Information Retrieval
  • Sentence Transformers
  • Extractive QA

For NASA SMD related, scientific usecases.


If you find this work useful, please cite using the following bibtex citation:

IBM Research

  • Masayasu Muraoka
  • Bishwaranjan Bhattacharjee
  • Rong Zhang
  • Yousef El Kurdi
  • Bharath Dandala


  • Muthukumaran Ramasubramanian
  • Iksha Gurung
  • Rahul Ramachandran
  • Manil Maskey
  • Kaylin Bugbee
  • Mike Little
  • Elizabeth Fancher
  • Lauren Sanders
  • Sylvain Costes
  • Sergi Blanco-Cuaresma
  • Kelly Lockhart
  • Thomas Allen
  • Felix Grazes
  • Megan Ansdell
  • Alberto Accomazzi
  • Sanaz Vahidinia
  • Ryan McGranaghan
  • Armin Mehrabian
  • Tsendgar Lee


This Encoder-only model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.

