Continue pre-training RoBERTa-base using discharge summaries from MIMIC-III datasets.
Details can be found in the following paper

Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. (https://arxiv.org/abs/2204.06683)

Important hyper-parameters


Max sequence	4096
Batch size	8
Learning rate	5e-5
Training epochs	6
Training time	130 GPU-hours

Downloads last month: 148

Inference Providers NEW

Fill-Mask

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support