CosmicRoBERTa

This model is a further pre-trained version of RoBERTa for space science on a domain-specific corpus, which includes abstracts from the NTRS library, abstracts from SCOPUS, ECSS requirements, and other sources from this domain. This totals to a pre-training corpus of around 75 mio words.

The model performs slightly better on a subset (0.6 of total data set) of the CR task presented in our paper SpaceTransformers: Language Modeling for Space Systems.

RoBERTa CosmiRoBERTa SpaceRoBERTa
Parameter 0.475 0.515 0.485
GN&C 0.488 0.609 0.602
System engineering 0.523 0.559 0.555
Propulsion 0.403 0.521 0.465
Project Scope 0.493 0.541 0.497
OBDH 0.717 0.789 0.794
Thermal 0.432 0.509 0.491
Quality control 0.686 0.704 0.678
Telecom. 0.360 0.614 0.557
Measurement 0.833 0.849 0.858
Structure & Mechanism 0.489 0.581 0.566
Space Environment 0.543 0.681 0.605
Cleanliness 0.616 0.621 0.651
Project Organisation / Documentation 0.355 0.427 0.429
Power 0.638 0.735 0.661
Safety / Risk (Control) 0.647 0.727 0.676
Materials / EEEs 0.585 0.642 0.639
Nonconformity 0.365 0.333 0.419
weighted 0.584 0.652(+7%) 0.633(+5%)
Valid. Loss 0.605 0.505 0.542

BibTeX entry and citation info

@ARTICLE{
9548078,  
author={Berquand, Audrey and Darm, Paul and Riccardi, Annalisa},  
journal={IEEE Access},   
title={SpaceTransformers: Language Modeling for Space Systems},   
year={2021},  
volume={9},  
number={},  
pages={133111-133122},  
doi={10.1109/ACCESS.2021.3115659}
}
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.