--- datasets: - wikimovies language: - English thumbnail: tags: - roberta - roberta-base - masked-language-modeling license: cc-by-4.0 --- # roberta-base for MLM ``` model_name = "thatdramebaazguy/roberta-base-wikimovies" pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="Fill-Mask") ``` ## Overview **Language model:** roberta-base **Language:** English **Downstream-task:** Fill-Mask **Training data:** wikimovies **Eval data:** wikimovies **Infrastructure**: 2x Tesla v100 **Code:** See [example](https://github.com/adityaarunsinghal/Domain-Adaptation/blob/master/shell_scripts/train_movie_roberta.sh) ## Hyperparameters ``` num_examples = 4346 batch_size = 16 n_epochs = 3 base_LM_model = "roberta-base" learning_rate = 5e-05 max_query_length=64 Gradient Accumulation steps = 1 Total optimization steps = 816 evaluation_strategy=IntervalStrategy.NO prediction_loss_only=False per_device_train_batch_size=8 per_device_eval_batch_size=8 adam_beta1=0.9 adam_beta2=0.999 adam_epsilon=1e-08, max_grad_norm=1.0 lr_scheduler_type=SchedulerType.LINEAR warmup_ratio=0.0 seed=42 eval_steps=500 metric_for_best_model=None greater_is_better=None label_smoothing_factor=0.0 ``` ## Performance perplexity = 4.3808 Some of my work: - [Domain-Adaptation Project](https://github.com/adityaarunsinghal/Domain-Adaptation/) ---