IbrahimAlAzhar
/

FutureGen_v2_dataset

scientific-articles

Model card Files Files and versions Community

IbrahimAlAzhar commited on Aug 23

Commit

4795cbc

·

verified ·

1 Parent(s): b1a2c1b

Create README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+annotations_creators:
+- expert-generated
+- machine-generated
+language_creators:
+- expert-generated
+- found
+languages:
+- en
+license: cc-by-4.0
+multilinguality:
+- monolingual
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- text-generation
+- text-classification
+task_ids:
+- future-work-generation
+- scientific-section-classification
+pretty_name: ACL Future Work Dataset (2023–2024)
+tags:
+- scientific-articles
+- future-work
+- NLP
+- ACL
+- NeurIPS
+- LLM-evaluation
+language:
+- en
+---
+# 🧠 ACL Future Work Dataset (2023–2024)
+This dataset consists of structured scientific paper data from ACL 2023 and ACL 2024 proceedings. Each paper is parsed into sections (e.g., Introduction, Related Work, Conclusion), and a **"Future Work"** section is automatically or manually extracted from the parsed text by searching for relevant future-oriented sentences in reverse section order.
+## 📁 Dataset Structure
+Each JSON file (`acl23_future_cleaned_final.json` and `acl24_future_cleaned_final.json`) has the following format:
+```json
+{
+  "ACL23_1.pdf": {
+    "abstractText": "Abstract of the paper...",
+    "sections": [
+      {
+        "heading": "1 Introduction",
+        "text": "..."
+      },
+      ...
+      {
+        "heading": "Future Work",
+        "text": "We plan to extend this method by..."
+      }
+    ],
+    "title": "Paper Title",
+    "year": 2023
+  },
+  ...
+}
+## 📜 License
+This dataset is licensed under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
+You are free to use, share, and adapt the dataset as long as you give appropriate credit.