---
license: mit
datasets:
- id_liputan6
language:
- id
metrics:
- rouge
pipeline_tag: summarization
tags:
- bart
---

# indobart-small
This model is a fine-tuned version of [bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) on [Liputan6](https://paperswithcode.com/dataset/liputan6) dataset.
See demo model here [notebook](https://colab.research.google.com/drive/1bcqS42M3e5IySPYtAa-S4UeyJczg9DXh?usp=sharing).

## Training procedure
### Training hyperparameters
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch | R1 Precision | R1 Recall | R1 Fmeasure | R2 Precision | R2 Recall | R2 Fmeasure | Rl Precision | Rl Recall | Rl Fmeasure |
|:-------------:|:-----:|:------------:|:---------:|:-----------:|:------------:|:---------:|:-----------:|:------------:|:---------:|:-----------:|
| 0.3064        | 1.0   | 0.3487       | 0.6043    | 0.4375      | 0.1318       | 0.2613    | 0.1723      | 0.3349       | 0.5833    | 0.4208      |

## Framework versions
- Transformers 4.40.0
- Pytorch 2.2.1+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("gaduhhartawan/indobart-base")
tokenizer = AutoTokenizer.from_pretrained("gaduhhartawan/indobart-base")

# Input article for summarization
ARTICLE_TO_SUMMARIZE = "lorem ipsum..."

# Generate summary
input_ids = tokenizer.encode(ARTICLE_TO_SUMMARIZE, return_tensors='pt')
summary_ids = model.generate(input_ids,
            min_length=30,
            max_length=150,
            num_beams=2,
            repetition_penalty=2.0,
            length_penalty=0.8,
            early_stopping=True,
            no_repeat_ngram_size=2,
            use_cache=True,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95)

# Decode the summary
summary_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary: ", summary_text)
```