rubert_ria_headlines
Description
bert2bert model, initialized with the DeepPavlov/rubert-base-cased
pretrained weights and
fine-tuned on the first 99% of "Rossiya Segodnya" news dataset for 2 epochs.
Usage example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
MODEL_NAME = "dmitry-vorobiev/rubert_ria_headlines"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
text = "Скопируйте текст статьи / новости"
encoded_batch = tokenizer.prepare_seq2seq_batch(
[text],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512)
output_ids = model.generate(
input_ids=encoded_batch["input_ids"],
max_length=36,
no_repeat_ngram_size=3,
num_beams=5,
top_k=0
)
headline = tokenizer.decode(output_ids[0],
skip_special_tokens=True,
clean_up_tokenization_spaces=False)
print(headline)
Datasets
How it was trained?
I used free TPUv3 on kaggle. The model was trained for 3 epochs with effective batch size 192 and soft restarts (warmup steps 1500 / 500 / 500 with new optimizer state on each epoch start).
Common train params:
export XLA_USE_BF16=1
export XLA_TENSOR_ALLOCATOR_MAXSIZE=100000000
python nlp_headline_rus/src/train_seq2seq.py \
--do_train \
--tie_encoder_decoder \
--max_source_length 512 \
--max_target_length 32 \
--val_max_target_length 48 \
--tpu_num_cores 8 \
--per_device_train_batch_size 24 \
--gradient_accumulation_steps 1 \
--learning_rate 5e-4 \
--adam_epsilon 1e-6 \
--weight_decay 1e-5 \
Validation results
- Using last 1% of ria dataset
- Using gazeta_ru test split
- Using gazeta_ru val split
- Downloads last month
- 42
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.