Fine-Tuned BERT2BERT Summarization Model

This model is fine-tuned based on the original BERT2BERT Indonesian Summarization model.

Fine-Tuned Dataset:

This model was fine-tuned using the Liputan6_ID dataset, which contains Indonesian news articles. The model is optimized for summarizing domain-specific texts from the Liputan6 dataset.

Code Sample

from transformers import BertTokenizer, EncoderDecoderModel

tokenizer = BertTokenizer.from_pretrained("rowjak/bert-indonesian-news-summarization")
tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token
model = EncoderDecoderModel.from_pretrained("rowjak/bert-indonesian-news-summarization")

# 
ARTICLE = ""

# generate summary
input_ids = tokenizer.encode(ARTICLE, return_tensors='pt')
summary_ids = model.generate(input_ids,
            max_length=125, 
            num_beams=2,
            repetition_penalty=2.5, 
            length_penalty=1.0, 
            early_stopping=True,
            no_repeat_ngram_size=2,
            use_cache=True)

summary_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary_text)

Output:

---
Downloads last month
64
Safetensors
Model size
250M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for rowjak/bert-indonesian-news-summarization

Finetuned
(5)
this model

Dataset used to train rowjak/bert-indonesian-news-summarization