metadata
annotations_creators:
- expert-generated
- machine-generated
language_creators:
- expert-generated
- found
languages:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-generation
- text-classification
task_ids:
- future-work-generation
- scientific-section-classification
pretty_name: ACL Future Work Dataset (2023β2024)
tags:
- scientific-articles
- future-work
- NLP
- ACL
- NeurIPS
- LLM-evaluation
language:
- en
π§ ACL Future Work Dataset (2023β2024)
This dataset consists of structured scientific paper data from ACL 2023 and ACL 2024 proceedings. Each paper is parsed into sections (e.g., Introduction, Related Work, Conclusion), and a "Future Work" section is automatically or manually extracted from the parsed text by searching for relevant future-oriented sentences in reverse section order.
π Dataset Structure
Each JSON file (acl23_future_cleaned_final.json
and acl24_future_cleaned_final.json
) has the following format:
{
"ACL23_1.pdf": {
"abstractText": "Abstract of the paper...",
"sections": [
{
"heading": "1 Introduction",
"text": "..."
},
...
{
"heading": "Future Work",
"text": "We plan to extend this method by..."
}
],
"title": "Paper Title",
"year": 2023
},
...
}
## π License
This dataset is licensed under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to use, share, and adapt the dataset as long as you give appropriate credit.
### βοΈ Curated by
Ibrahim Al Azher, Northern Illinois University, DATALab