Fill-Mask
Transformers
PyTorch
English
roberta
earth science
climate
biology
Inference Endpoints
File size: 4,066 Bytes
8e7c856
 
5c3e361
6dc0f2d
5c3e361
 
 
6dc0f2d
 
 
 
 
 
 
8e7c856
5c3e361
f63f2b9
5c3e361
f63f2b9
5c3e361
 
419bbb3
 
 
 
857e8f7
5c3e361
419bbb3
 
55a1ca5
 
822b34c
419bbb3
069f0d5
5c3e361
412ff1e
5c3e361
419bbb3
 
46e33f5
419bbb3
5c3e361
 
419bbb3
93a66fb
 
5c3e361
069f0d5
f4ae32c
 
192b9dc
5c3e361
ee299e3
 
 
 
 
419bbb3
 
 
 
069f0d5
 
 
5c3e361
a79e96c
 
 
 
 
f01d42f
a381c3f
46e33f5
d9915fd
a381c3f
2609f8e
a381c3f
 
 
 
 
 
46e33f5
d9915fd
46e33f5
 
 
 
70e5c6a
46e33f5
 
2609f8e
 
46e33f5
 
 
0d30fdd
46e33f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc95fec
 
 
 
6dc0f2d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: fill-mask
tags:
- earth science
- climate
- biology
datasets:
- nasa-impact/nasa-smd-IR-benchmark
- nasa-impact/nasa-smd-qa-benchmark
- ibm/Climate-Change-NER
---

# Model Card for nasa-smd-ibm-v0.1 (Indus)

nasa-smd-ibm-v0.1 (Currently named as Indus) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.

## Model Details
- **Base Model**: RoBERTa
- **Tokenizer**: Custom
- **Parameters**: 125M
- **Pretraining Strategy**: Masked Language Modeling (MLM)
- **Distilled Version**: You can download a distilled version of the model (30 Million Parameters) here: https://drive.google.com/file/d/19s2Vv9WlmlRhh_AhzdP-s__0spQCG8cQ/view?usp=sharing

## Training Data
- Wikipedia English (Feb 1, 2020)
- AGU Publications
- AMS Publications
- Scientific papers from Astrophysics Data Systems (ADS)
- PubMed abstracts
- PubMedCentral (PMC) (commercial license subset)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/H0-q9N7IwXQqLdEaCCgm-.png)

## Training Procedure 
- **Framework**: fairseq 0.12.1 with PyTorch 1.9.1
- **transformers Version**: 4.2.0
- **Strategy**: Masked Language Modeling (MLM)

## Evaluation
- BLURB Benchmark
- Pruned SQuAD2.0 (SQ2) Benchmark (Amazon Rainforest, Oxygen, Geology and NASA ES QAs)
- NASA SMD Expert QA Benchmark (WIP)


![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/EtCC3U_tMCv3bfLqQdqQm.png)

![Pruned SQ2 Benchmark](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/ruh6-IyiNlUiK21Ej4lDM.png)

Please refer to the following dataset cards for further benchmarks and evaluation
- NASA IR Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-IR-benchmark
- NASA SMD Expert QA Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark
- Climate CHange Benchmark - https://huggingface.co/datasets/ibm/Climate-Change-NER

## Uses
- Named Entity Recognition (NER)
- Information Retrieval
- Sentence Transformers
- Extractive QA

For NASA SMD related, scientific usecases.

## Note

Accompanying paper can be found here: https://arxiv.org/abs/2405.10725


## Citation
If you find this work useful, please cite using the following bibtex citation:

```bibtex
@misc {nasa-impact_2023,
	author       = {Masayasu Maraoka and Bishwaranjan Bhattacharjee and Muthukumaran Ramasubramanian and Ikhsa Gurung and Rahul Ramachandran and Manil Maskey and Kaylin Bugbee and Rong Zhang and Yousef El Kurdi and Bharath Dandala and Mike Little and Elizabeth Fancher and Lauren Sanders and Sylvain Costes and Sergi Blanco-Cuaresma and Kelly Lockhart and Thomas Allen and Felix Grazes and Megan Ansdell and Alberto Accomazzi and Sanaz Vahidinia and Ryan McGranaghan and Armin Mehrabian and Tsendgar Lee},
	title        = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
	year         = 2023,
	url          = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
	doi          = { 10.57967/hf/1429 },
	publisher    = { Hugging Face }
}

```

## Attribution

IBM Research
- Masayasu Muraoka
- Bishwaranjan Bhattacharjee
- Rong Zhang
- Yousef El Kurdi
- Bharath Dandala

NASA SMD
- Muthukumaran Ramasubramanian
- Iksha Gurung
- Rahul Ramachandran
- Manil Maskey
- Kaylin Bugbee
- Mike Little
- Elizabeth Fancher
- Lauren Sanders
- Sylvain Costes
- Sergi Blanco-Cuaresma
- Kelly Lockhart
- Thomas Allen
- Felix Grazes
- Megan Ansdell
- Alberto Accomazzi
- Sanaz Vahidinia
- Ryan McGranaghan
- Armin Mehrabian
- Tsendgar Lee

## Disclaimer

This Encoder-only model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.