nasa-impact
/

nasa-smd-ibm-v0.1

@@ -1,95 +1,54 @@
 ---
 license: apache-2.0
 language:
-- en
 library_name: transformers
 pipeline_tag: fill-mask
 tags:
-- climate
-- biology
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-This domain-adapted,(RoBERTa)[https://huggingface.co/roberta-base] based, Encoder-only transformer model is finetuned using select scientific journals and articles related to NASA Science Mission Directorate(SMD). It's intended purpose is to aid in NLP efforts within NASA. e.g.: Information retrieval, Intelligent search and discovery.
 ## Model Details
-- RoBERTa as base model
-- Custom tokenizer
-- 125M parameters
-- Masked Language Modeling (MLM) pretraining strategy
-### Model Description
-<!-- - **Developed by:** NASA IMPACT and IBM Research
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed] -->
-## Uses
-- Named Entity Recognition (NER), Information revreival, sentence-transformers.
-## Training Details
-### Training Data
-The model was trained on the following datasets:
-1.  Wikipedia English dump of February 1, 2020
-2.  NASA own data
-3.  NASA papers
-4.  NASA Earth Science papers
-5.  NASA Astrophysics Data System
-6.  PubMed abstract
-7.  PMC : subset with commercial license
- The sizes of the dataset is shown in the following chart.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/CTNkn0WHS268hvidFmoqj.png)
-<!-- Provide the basic links for the model.
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
--->
-### Training Procedure
-The model was trained on fairseq 0.12.1 with PyTorch 1.9.1 on transformer version 4.2.0. Masked Language Modeling (MLM) is the pretraining stragegy used.
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 ## Evaluation
-### BLURB Benchmark
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/K0IpQnTQmrfQJ1JXxn1B6.png)
-### Pruned SQuAD2.0 (SQ2) Benchmark
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/R4oMJquUz4puah3lvd5Ve.png)
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-### NASA SMD Experts Benchmark
-WIP!
 ## Citation
-Please use the DOI provided by Huggingface to cite the model.
-## Model Card Authors [optional]
-Bishwaranjan Bhattacharjee, IBM Research
-Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)
-## Model Card Contact
-Muthukumaran Ramasubramanian (mr0051@uah.edu)

 ---
 license: apache-2.0
 language:
+  - en
 library_name: transformers
 pipeline_tag: fill-mask
 tags:
+  - climate
+  - biology
 ---
+# Model Card for nasa-smd-ibm-v0.1
+nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
 ## Model Details
+- **Base Model**: RoBERTa
+- **Tokenizer**: Custom
+- **Parameters**: 125M
+- **Pretraining Strategy**: Masked Language Modeling (MLM)
+## Training Data
+- Wikipedia English (Feb 1, 2020)
+- NASA datasets
+- Scientific papers (NASA Earth Science, Astrophysics)
+- PubMed abstracts
+- PMC (commercial license subset)
+![Dataset Size Chart](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/CTNkn0WHS268hvidFmoqj.png)
+## Training Procedure
+- **Framework**: fairseq 0.12.1 with PyTorch 1.9.1
+- **Transformer Version**: 4.2.0
+- **Strategy**: Masked Language Modeling (MLM)
 ## Evaluation
+- BLURB Benchmark
+- Pruned SQuAD2.0 (SQ2) Benchmark
+- NASA SMD Experts Benchmark (WIP)
+![BLURB Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/K0IpQnTQmrfQJ1JXxn1B6.png)
+![SQ2 Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/R4oMJquUz4puah3lvd5Ve.png)
+## Uses
+- Named Entity Recognition (NER)
+- Information Retrieval
+- Sentence Transformers
 ## Citation
+Refer to the DOI provided by Huggingface for citations.
+## Contacts
+- Bishwaranjan Bhattacharjee, IBM Research
+- Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)