mtreviso/sparsemax-roberta

Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0).

To run, do this:

from sparse_roberta import get_custom_model

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('roberta-base')

# Load the model
model = get_custom_model(
    'mtreviso/sparsemax-roberta',
    initial_alpha=2.0,
    use_triton_entmax=False,
    from_scratch=False,
)

To run glue tasks, you can use the run_glue.py script. For example:

python run_glue.py \
  --model_name_or_path mtreviso/sparsemax-roberta \
  --config_name roberta-base \
  --tokenizer_name roberta-base \
  --task_name rte \
  --output_dir output-rte \
  --do_train \
  --do_eval \
  --max_seq_length 512 \
  --per_device_train_batch_size 32 \
  --learning_rate 3e-5 \
  --num_train_epochs 3 \
  --save_steps 1000 \
  --logging_steps 100 \
  --save_total_limit 1 \
  --overwrite_output_dir

mtreviso
/

sparsemax-roberta

You need to agree to share your contact information to access this model