Edit model card

falcon-7b-cnn-dailymail

This model is a fine-tuned version of ybelkada/falcon-7b-sharded-bf16 on the cnn_dailymail dataset.

Model description

The model inherits the architecture and tokenizer from falcon-7b, but was finetuned using 4-bit quantization from bitsandbytes and QLORA from the peft library. The HuggingFace trl library has a SFTTrainer class that oversaw the fine-tune process.

The resulting model comes from fine-tuning on a single NVIDIA L4 instance (24 GB VRAM) from Google Cloud Platform.

Intended uses & limitations

The model is intended to be used for summarizing news articles. Since the fine-tuning dataset is cnn_dailymail, it's worth limiting to shorter articles from CNN and the Daily Mail for best results. The model is not intended for other summarization purposes, although it would be interesting to see if its summarization capabilities extend to other short forms of text.

Training and evaluation data

The model was fine-tuned over the cnn_dailymail dataset (the train set specifically), where articles were the "prompts" and highlights were the "responses." Prior to training, the two columns were combined for the causal LM task.

Each observation was formatted as the following:

### Article
Article goes here...

### Summary
Highlights go here...

For inference, formatting the article in the same way and finishing with the summary tag indicates that the model should generate a summary.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 5
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 25
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 500

Training results

Good question, haven't really looked into it yet. Also worth noting that these are generally arbitrary hyperparameters, since no tuning was performed.

Framework versions

  • Transformers 4.30.0.dev0
  • Pytorch 2.0.1
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train thisjustinh/falcon-7b-cnn-dailymail