Fill-Mask
Transformers
PyTorch
English
roberta
earth science
climate
biology
Inference Endpoints
nasa-smd-ibm-v0.1 / README.md
Muthukumaran's picture
Update README.md
a1d08f2
|
raw
history blame
2.06 kB
metadata
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: fill-mask
tags:
  - climate
  - biology

Model Card for nasa-smd-ibm-v0.1

nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.

Model Details

  • Base Model: RoBERTa
  • Tokenizer: Custom
  • Parameters: 125M
  • Pretraining Strategy: Masked Language Modeling (MLM)

Training Data

  • Wikipedia English (Feb 1, 2020)
  • AGU Publications
  • AMS Publications
  • Scientific papers from Astrophysics Data Systems
  • PubMed abstracts
  • PMC (commercial license subset)

Dataset Size Chart

Training Procedure

  • Framework: fairseq 0.12.1 with PyTorch 1.9.1
  • Transformer Version: 4.2.0
  • Strategy: Masked Language Modeling (MLM)

Evaluation

  • BLURB Benchmark
  • Pruned SQuAD2.0 (SQ2) Benchmark
  • NASA SMD Experts Benchmark (WIP)

BLURB Benchmark Results SQ2 Benchmark Results

Uses

  • Named Entity Recognition (NER)
  • Information Retrieval
  • Sentence Transformers

Citation

If you find this work useful, please cite using the following bibtex citation:

@misc {nasa-impact_2023, author = { {NASA-IMPACT} }, title = { nasa-smd-ibm-v0.1 (Revision f01d42f) }, year = 2023, url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 }, doi = { 10.57967/hf/1429 }, publisher = { Hugging Face } }

Contacts

  • Bishwaranjan Bhattacharjee, IBM Research
  • Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)