--- datasets: - bakhitovd/data_science_arxiv metrics: - rouge license: cc0-1.0 pipeline_tag: summarization --- # Fine-tuned Longformer for Summarization of Machine Learning Articles ## Model Details - GitHub: https://github.com/Bakhitovd/led-base-7168-ml - Model name: bakhitovd/led-base-7168-ml - Model type: Longformer (alenai/led-base-16384) - Model description: This Longformer model has been fine-tuned on a focused subset of the arXiv part of the scientific papers dataset, specifically targeting articles about Machine Learning. It aims to generate accurate and consistent summaries of machine learning research papers. ## Intended Use This model is intended to be used for text summarization tasks, specifically for summarizing machine learning research papers. ## How to Use ```python import torch from transformers import LEDTokenizer, LEDForConditionalGeneration tokenizer = LEDTokenizer.from_pretrained("bakhitovd/led-base-7168-ml") model = LEDForConditionalGeneration.from_pretrained("bakhitovd/led-base-7168-ml") ``` ## Use the model for summarization ```python article = "... long document ..." inputs_dict = tokenizer.encode(article, padding="max_length", max_length=16384, return_tensors="pt", truncation=True) input_ids = inputs_dict.input_ids.to("cuda") attention_mask = inputs_dict.attention_mask.to("cuda") global_attention_mask = torch.zeros_like(attention_mask) global_attention_mask[:, 0] = 1 predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512) summary = tokenizer.decode(predicted_abstract_ids, skip_special_tokens=True) print(summary) ``` ## Training Data Dataset name: bakhitovd/data_science_arxiv\ This dataset is a subset of the 'Scientific papers' dataset, which contains articles semantically, structurally, and meaningfully closest to articles describing machine learning. This subset was obtained using K-means clustering on the embeddings generated by SciBERT. ## Evaluation Results The model's performance was evaluated using ROUGE metrics and it showed improved performance over the baseline models. ![image.png](https://s3.amazonaws.com/moonup/production/uploads/63fb9a520aa18292d5c1027a/19mfKrjHkiCFDAL557Vsu.png)