--- language: - en tags: - summarization library_name: peft datasets: - scientific_papers metrics: - rouge model-index: - name: flan-t5-base-finetuned-arxiv results: - task: type: summarization name: Summarization dataset: name: scientific_papers type: scientific_papers args: arxiv metrics: - name: Rouge1 type: rouge value: 12.032000 - name: Rouge2 type: rouge value: 4.384100 - name: Rougel type: rouge value: 9.842600 - name: Rougelsum type: rouge value: 11.139600 --- ## flan-t5-base-finetuned-arxiv This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the scientific_papers dataset. It achieves the following results on the evaluation set: - Loss: 2.485082 - Rouge1: 12.032000 - Rouge2: 4.38100 - Rougel: 9.842600 - Rougelsum: 11.139600 - Gen Len: 19.000000 ## Training procedure The following `bitsandbytes` quantization config was used during training: - quant_method: bitsandbytes - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 The following `bitsandbytes` quantization config was used during training: - quant_method: bitsandbytes - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-4 - weight_decay: 0.01 - train_batch_size: 32 - optimizer: paged_adamw_8bit (8-bit adam optimization) - num_epochs: 4.47 - fp16: False ### Framework versions - PEFT 0.5.0 - PEFT 0.5.0 - Transformers 4.35.0 - Pytorch 1.10.1+cu111 - Datasets 2.14.7 - Tokenizers 0.14.1 - bitsandbytes 0.41.2.post2 - accelerate 0.24.0 - evaluate 0.4.1 - rouge-score 0.1.2