Fine-tuned OpenCALM-7B Adapters for Meeting Summarization

Description

These are weights for LoRA adapters fine-tuned on the OpenCALM-7B (Andonian et al., 2021) model for Japanese meeting summarization.

Usage

Load model and tokenizer

Loading the model in the 4-bit quantized format is recommended to get reliable results since these LoRA adapters were trained by using QLoRA (Dettmers et al., 2023).

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained("cyberagent/open-calm-7b")
model = AutoModelForCausalLM.from_pretrained(
    "cyberagent/open-calm-7b", 
    quantization_config=bnb_config, 
    device_map="auto"
)

model = PeftModel.from_pretrained(model, "haih2/open-calm-7b-summarizer-lora")

Generate summary

In the prompt provided to the model:

  • The first part is the length of the summary to be generated,
  • and The second part is the source meeting to be summarized.
prompt = "この段落の要約50字以内生成:次に、私立高校の生徒に対する留学支援についてでございますが、都内の私立高校は、それぞれの学校における教育方針に基づきまして、生徒の留学先として海外の学校と提携するなど、既にさまざまな独自の取り組みを進めております。\\nこうした状況等を踏まえ、私立高校を対象とした留学支援のあり方について、今後検討してまいります。\\n\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    tokens = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_k=32,
        top_p=0.9,
        repetition_penalty=1.0,
        no_repeat_ngram_size=0,
        pad_token_id=tokenizer.pad_token_id,
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)

Prompt Format

Any prompt is fine, but it is suggested to have length and source parts as follows:

"この段落を{length}に要約しなさい:{source}\n要約:"

or

"この段落の要約{length}生成:{source}\n"

Fine-tuning Details

Dataset

Fine-tuning procedure

The OpenCALM-7B model was fine-tuned on the above dataset using the QLoRA method with prompt この段落の要約{length}生成:{source}\n. We outline the following hyperparameters:

Optimizer
  beta_1
  beta_2
  weight decay
AdamW
0.9
0.999
0.01
Learning rate
  scheduler type
2e-5
linear
LoRA
  target modules
  r
  alpha
  dropout

query_key_value, dense
4
64
0.05
Quantization (for QLoRA)
  compute dtype
  storage dtype
  quantization strategy

float16
nf4
double quantization
Sequence length 1536
Batch size 4
Gradient accumulation steps 2
Epochs 10
Warmup steps 200

Evaluation

Testing data & Metric

We evaluated the model on two sets: one for multi-topic summarization and the other for single-topic summarization. ROUGE-L (F1-score-based) with the Japanese Mecab tokenizer was used as the evaluation metric.

Results

Solution/Model ROUGE-L
(multi-topic)
ROUGE-L
(single-topic)
1st place solution* 34.12 34.44
2nd place solution* 32.79 33.65
OpenCALM-7B (QLoRA) 36.75 33.31

* These scores are extracted from this leaderboard for the summarization task.

Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.