Muthukumaran commited on
Commit
dabf2ec
·
verified ·
1 Parent(s): fd9de18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,9 +10,9 @@ tags:
10
  - biology
11
  ---
12
 
13
- # Model Card for nasa-smd-ibm-distil-v0.1
14
 
15
- nasa-smd-ibm-distil-v0.1 is a distilled version of the RoBERTa-based, Encoder-only transformer model Indus (nasa-impact/nasa-smd-ibm-v0.1), domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
16
 
17
  We trained the smaller model, INDUS_SMALL, with 38M parameters through knowledge distillation techniques by using INDUS as the teacher. INDUS_SMALL follows a 4-layer architecture recommended by the Neural Architecture Search engine (Trivedi et al., 2023) with an optimal trade-off between performance and latency. We adopted the distillation objective proposed in MiniLMv2 (Wang et al., 2021) to transfer fine-grained self-attention relations, which has been shown to be the current state-of-the-art (Udagawa et al., 2023). Using this objective, we trained the model for 500K steps with an effective batch size of 480 on 30 V100 GPUs.
18
 
 
10
  - biology
11
  ---
12
 
13
+ # Model Card for nasa-smd-ibm-distil-v0.1 (INDUS-Small)
14
 
15
+ nasa-smd-ibm-distil-v0.1 (INDUS-Small) is a distilled version of the RoBERTa-based, Encoder-only transformer model Indus (nasa-impact/nasa-smd-ibm-v0.1), domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
16
 
17
  We trained the smaller model, INDUS_SMALL, with 38M parameters through knowledge distillation techniques by using INDUS as the teacher. INDUS_SMALL follows a 4-layer architecture recommended by the Neural Architecture Search engine (Trivedi et al., 2023) with an optimal trade-off between performance and latency. We adopted the distillation objective proposed in MiniLMv2 (Wang et al., 2021) to transfer fine-grained self-attention relations, which has been shown to be the current state-of-the-art (Udagawa et al., 2023). Using this objective, we trained the model for 500K steps with an effective batch size of 480 on 30 V100 GPUs.
18