IbrahimAlAzhar's picture
Update README.md
d1543c4 verified
metadata
annotations_creators:
  - expert-generated
  - machine-generated
language_creators:
  - expert-generated
  - found
languages:
  - en
license: cc-by-4.0
multilinguality:
  - monolingual
size_categories:
  - 1K<n<10K
source_datasets:
  - original
task_categories:
  - text-generation
  - text-classification
task_ids:
  - future-work-generation
  - scientific-section-classification
pretty_name: ACL Future Work Dataset (2023–2024)
tags:
  - scientific-articles
  - future-work
  - NLP
  - ACL
  - NeurIPS
  - LLM-evaluation
language:
  - en

🧠 ACL Future Work Dataset (2023–2024)

This dataset consists of structured scientific paper data from ACL 2023 and ACL 2024 proceedings. Each paper is parsed into sections (e.g., Introduction, Related Work, Conclusion), and a "Future Work" section is automatically or manually extracted from the parsed text by searching for relevant future-oriented sentences in reverse section order.

πŸ“ Dataset Structure

Each JSON file (acl23_future_cleaned_final.json and acl24_future_cleaned_final.json) has the following format:

{
  "ACL23_1.pdf": {
    "abstractText": "Abstract of the paper...",
    "sections": [
      {
        "heading": "1 Introduction",
        "text": "..."
      },
      ...
      {
        "heading": "Future Work",
        "text": "We plan to extend this method by..."
      }
    ],
    "title": "Paper Title",
    "year": 2023
  },
  ...
}

## πŸ“œ License

This dataset is licensed under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).  
You are free to use, share, and adapt the dataset as long as you give appropriate credit.

### ✍️ Curated by 
Ibrahim Al Azher, Northern Illinois University, DATALab