---
datasets:
- bakhitovd/data_science_arxiv
metrics:
- rouge
license: cc0-1.0
pipeline_tag: summarization
---
# Fine-tuned Longformer for Summarization of Machine Learning Articles

## Model Details
- GitHub: https://github.com/Bakhitovd/led-base-7168-ml
- Model name: bakhitovd/led-base-7168-ml
- Model type: Longformer (alenai/led-base-16384)
- Model description: This Longformer model has been fine-tuned on a focused subset of the arXiv part of the scientific papers dataset, specifically targeting articles about Machine Learning. It aims to generate accurate and consistent summaries of machine learning research papers.
## Intended Use
This model is intended to be used for text summarization tasks, specifically for summarizing machine learning research papers.
## How to Use
```python
import torch
from transformers import LEDTokenizer, LEDForConditionalGeneration
tokenizer = LEDTokenizer.from_pretrained("bakhitovd/led-base-7168-ml")
model = LEDForConditionalGeneration.from_pretrained("bakhitovd/led-base-7168-ml")
```

## Use the model for summarization
```python
article = "... long document ..."
inputs_dict = tokenizer.encode(article, padding="max_length", max_length=16384, return_tensors="pt", truncation=True)
input_ids = inputs_dict.input_ids.to("cuda")
attention_mask = inputs_dict.attention_mask.to("cuda")
global_attention_mask = torch.zeros_like(attention_mask)
global_attention_mask[:, 0] = 1
predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512)
summary = tokenizer.decode(predicted_abstract_ids, skip_special_tokens=True)
print(summary)
```
## Training Data
Dataset name: bakhitovd/data_science_arxiv\
This dataset is a subset of the 'Scientific papers' dataset, which contains articles semantically, structurally, and meaningfully closest to articles describing machine learning. This subset was obtained using K-means clustering on the embeddings generated by SciBERT.
## Evaluation Results
The model's performance was evaluated using ROUGE metrics and it showed improved performance over the baseline models.

![image.png](https://s3.amazonaws.com/moonup/production/uploads/63fb9a520aa18292d5c1027a/19mfKrjHkiCFDAL557Vsu.png)