rubert_ria_headlines

Description

bert2bert model, initialized with the DeepPavlov/rubert-base-cased pretrained weights and fine-tuned on the first 99% of "Rossiya Segodnya" news dataset for 2 epochs.

Usage example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "dmitry-vorobiev/rubert_ria_headlines"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

text = "Скопируйте текст статьи / новости"

encoded_batch = tokenizer.prepare_seq2seq_batch(
    [text],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512)

output_ids = model.generate(
    input_ids=encoded_batch["input_ids"],
    max_length=36,
    no_repeat_ngram_size=3,
    num_beams=5,
    top_k=0
)

headline = tokenizer.decode(output_ids[0], 
                            skip_special_tokens=True, 
                            clean_up_tokenization_spaces=False)
print(headline)

Datasets

How it was trained?

I used free TPUv3 on kaggle. The model was trained for 3 epochs with effective batch size 192 and soft restarts (warmup steps 1500 / 500 / 500 with new optimizer state on each epoch start).

Common train params:

export XLA_USE_BF16=1
export XLA_TENSOR_ALLOCATOR_MAXSIZE=100000000

python nlp_headline_rus/src/train_seq2seq.py \
    --do_train \
    --tie_encoder_decoder \
    --max_source_length 512 \
    --max_target_length 32 \
    --val_max_target_length 48 \
    --tpu_num_cores 8 \
    --per_device_train_batch_size 24 \
    --gradient_accumulation_steps 1 \
    --learning_rate 5e-4 \
    --adam_epsilon 1e-6 \
    --weight_decay 1e-5 \

Validation results

Downloads last month
42
Safetensors
Model size
207M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.