Model Card
There are facebook/bard-large-cnn LoRA finetuned model for dialogue summarization. This LoRA weights trained on dialogue sum augmented dataset
Model Details
cfg.lora_params = {
'target_modules':['out_proj', 'v_proj', 'q_proj', 'cf1', 'cf2'],
'r':8,
'lora_alpha': 16,
}
lora_conf = LoraConfig(
**cfg.lora_params,
lora_dropout = 0.05,
bias = 'none',
task_type = TaskType.CAUSAL_LM,
init_lora_weights = 'gaussian',
)
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub by doublecringe
- Developed by: doublecringe
- Model type: LoRA (PEFT)
- Language(s) (NLP): English
- Finetuned from model [optional]: facebook\bart-large-cnn
Uses
There are where this LoRA model can be usefull:
- Summarize Dialogues
- Summarize the News and etc.
Direct Use
Model was developed for use it for summarize dialogues to chatbots to make them dont forget the meaning from first messages
How to Get Started with the Model
There model inference way:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
from torch.nn import DataParallel
class SumModel():
def __init__(cfg, model_preset, **generation_parameters)->None:
# requires transformers and peft installed libs
cfg.model_preset = model_preset
cfg.generation_params = generation_parameters
config = PeftConfig.from_pretrained(cfg.model_preset)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
cfg.tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
cfg.lora_model = PeftModel.from_pretrained(model, cfg.model_preset)
cfg.lora_model.print_trainable_parameters()
cfg.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
cfg.lora_model.model = DataParallel(cfg.lora_model.model)
def __call__(self, text, **generation_params):
tokens = self.tokenizer(text, return_tensors = 'pt', truncation=True, padding=True).to(self.device)
if len(generation_params):
gen = self.lora_model.generate(**tokens, **generation_params)
else:
gen = self.lora_model.generate(**tokens, **self.generation_params)
return self.tokenizer.batch_decode(gen)
model = SumModel(model_preset = 'doublecringe123/bardt-large-cnn-dialoguesum-booksum-lora',
max_length = 96,
min_length = 26,
do_sample = True,
temperature = 0.9,
num_beams = 8,
repetition_penalty= 2.)
Training Hyperparameters
- fp16=True,
- learning rate = 2e-5,
- weights decay = .01,
- batch size = 8
Speeds, Sizes, Times [optional]
Model trained 18 hours - 12 epochs on kaggle notebook GPU A100 enviroment. First 6 epochs Last epochs until 12
Results
There is revisions comparition on test dataset: notebook
Model Architecture and Objective
LoRA
Hardware
GPU A100
Software
Python, Transformes, PEFT
- Downloads last month
- 321