falcon-7b-cnn-dailymail

This model is a fine-tuned version of ybelkada/falcon-7b-sharded-bf16 on the cnn_dailymail dataset.

Model description

The model inherits the architecture and tokenizer from falcon-7b, but was finetuned using 4-bit quantization from bitsandbytes and QLORA from the peft library. The HuggingFace trl library has a SFTTrainer class that oversaw the fine-tune process.

The resulting model comes from fine-tuning on a single NVIDIA L4 instance (24 GB VRAM) from Google Cloud Platform.

Intended uses & limitations

The model is intended to be used for summarizing news articles. Since the fine-tuning dataset is cnn_dailymail, it's worth limiting to shorter articles from CNN and the Daily Mail for best results. The model is not intended for other summarization purposes, although it would be interesting to see if its summarization capabilities extend to other short forms of text.

Training and evaluation data

The model was fine-tuned over the cnn_dailymail dataset (the train set specifically), where articles were the "prompts" and highlights were the "responses." Prior to training, the two columns were combined for the causal LM task.

Each observation was formatted as the following:

### Article
Article goes here...

### Summary
Highlights go here...

For inference, formatting the article in the same way and finishing with the summary tag indicates that the model should generate a summary.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 5
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 5
total_train_batch_size: 25
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.03
training_steps: 500

Training results

Good question, haven't really looked into it yet. Also worth noting that these are generally arbitrary hyperparameters, since no tuning was performed.

Framework versions

Transformers 4.30.0.dev0
Pytorch 2.0.1
Datasets 2.12.0
Tokenizers 0.13.3

thisjustinh
/

falcon-7b-cnn-dailymail