IbrahimAlAzhar commited on
Commit
4795cbc
·
verified ·
1 Parent(s): b1a2c1b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ annotations_creators:
3
+ - expert-generated
4
+ - machine-generated
5
+ language_creators:
6
+ - expert-generated
7
+ - found
8
+ languages:
9
+ - en
10
+ license: cc-by-4.0
11
+ multilinguality:
12
+ - monolingual
13
+ size_categories:
14
+ - 1K<n<10K
15
+ source_datasets:
16
+ - original
17
+ task_categories:
18
+ - text-generation
19
+ - text-classification
20
+ task_ids:
21
+ - future-work-generation
22
+ - scientific-section-classification
23
+ pretty_name: ACL Future Work Dataset (2023–2024)
24
+ tags:
25
+ - scientific-articles
26
+ - future-work
27
+ - NLP
28
+ - ACL
29
+ - NeurIPS
30
+ - LLM-evaluation
31
+ language:
32
+ - en
33
+ ---
34
+
35
+ # 🧠 ACL Future Work Dataset (2023–2024)
36
+
37
+ This dataset consists of structured scientific paper data from ACL 2023 and ACL 2024 proceedings. Each paper is parsed into sections (e.g., Introduction, Related Work, Conclusion), and a **"Future Work"** section is automatically or manually extracted from the parsed text by searching for relevant future-oriented sentences in reverse section order.
38
+
39
+ ## 📁 Dataset Structure
40
+
41
+ Each JSON file (`acl23_future_cleaned_final.json` and `acl24_future_cleaned_final.json`) has the following format:
42
+
43
+ ```json
44
+ {
45
+ "ACL23_1.pdf": {
46
+ "abstractText": "Abstract of the paper...",
47
+ "sections": [
48
+ {
49
+ "heading": "1 Introduction",
50
+ "text": "..."
51
+ },
52
+ ...
53
+ {
54
+ "heading": "Future Work",
55
+ "text": "We plan to extend this method by..."
56
+ }
57
+ ],
58
+ "title": "Paper Title",
59
+ "year": 2023
60
+ },
61
+ ...
62
+ }
63
+
64
+ ## 📜 License
65
+
66
+ This dataset is licensed under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
67
+ You are free to use, share, and adapt the dataset as long as you give appropriate credit.