Abstractive-Style-Summarizer / README.md

lityops

Update README.md

a0aaaa3 verified 25 days ago

preview code

raw

history blame contribute delete

2.93 kB

metadata

language: en
base_model: google/flan-t5-base
library_name: peft
tags:
  - base_model:adapter:google/flan-t5-base
  - lora
  - transformers
  - summarization
  - abstractive-summarization
  - generated_from_trainer
model-index:
  - name: Abstractive Style Summarizer
    results: []
datasets:
  - xsum
  - cnn_dailymail
  - multi_news
license: mit

Abstractive Style Summarizer

This model is a fine-tuned version of google/flan-t5-base using PEFT (LoRA). It is designed to generate abstractive summaries in three distinct styles: Harsh (concise), Balanced (standard), and Detailed (comprehensive).

Model Details

Model Description

Model type: Sequence-to-Sequence Transformer (T5)
Language(s): English
License: MIT
Finetuned from model: google/flan-t5-base
Training Method: PEFT (LoRA)

Model Sources

Repository: Flatten
Base Model: google/flan-t5-base

Uses

Direct Use

The model interprets a prefixed prompt to determine the style of the summary.

Harsh: Generates very short, punchy summaries (approx. 35% of input length).
Balanced: Generates standard news summaries (approx. 50% of input length).
Detailed: Generates in-depth summaries (approx. 70% of input length).

Prompt Format

The input text should be prefixed with the desired style:

Summarize {Style}: {Input Text}

Example: Summarize Harsh: The Walt Disney Co. announced...

Training Details

Training Data

The model was trained on a combined dataset of 12,000 samples, split into 80% Train, 10% Validation, and 10% Test.

Style	Source Dataset	Size
Harsh	XSum	4000
Balanced	CNN/DailyMail	4000
Detailed	Multi-News	4000

Training Procedure

Training Hyperparameters

Learning Rate: 5e-4
Batch Size: 4 per device
Gradient Accumulation Steps: 2
Num Epochs: 5
Optimizer: AdamW
LR Scheduler: Linear with warmup (ratio 0.05)
Mixed Precision: BF16

LoRA Configuration

r: 32
lora_alpha: 64
lora_dropout: 0.05
target_modules: ["q", "k", "v", "o"]
bias: "none"
task_type: "SEQ_2_SEQ_LM"

Evaluation Results

Evaluated on the held-out test set (1,200 samples) at Step 6000.

Metric	Score
ROUGE-1	0.3925
ROUGE-2	0.1608
ROUGE-L	0.2776
Validation Loss	0.7824

Environmental Impact

Hardware Type: CUDA-enabled GPU
Compute: LoRA fine-tuning (Parameters: 7M trainable / 254M total)

Framework Versions

Datasets==3.6.0
Pytorch>=2.5.1
Transformers>=4.36.0
PEFT>=0.8.0