File size: 4,891 Bytes
7c7dd5a
 
 
4aa5f9f
 
 
 
7c7dd5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66510b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50e147d
66510b9
50e147d
 
 
 
 
66510b9
1ce4585
 
 
 
 
 
66510b9
1ce4585
 
 
 
50e147d
1ce4585
 
 
 
66510b9
1ce4585
 
 
50e147d
1ce4585
 
66510b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ce4585
d3c23b7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
language: [en]
tags:
  - Earthwork
  - Quantity
  - Estimation
  - CAD
  - deep-learning
  - LLM
  - MLP
  - LSTM
  - Transformers
  - BERT
license: mit
datasets:
  - custom
  - cad-drawings
metrics:
  - accuracy
  - loss
library_name: transformers
model_name: EarthworkNet
widget:
  - example_input: "Cross-sectional data for earthwork quantity prediction"
  - example_output: "Predicted earthwork quantity such as cut, refill etc : 1234 m³"
---

# Earthwork Network Architecture (ENA)

## Overview

The Earthwork Network Architecture (ENA) is the deep learning model designed to compare the accurate estimation prediction of earthwork quantities. This repository includes four distinct deep learning models—MLP, LSTM, Transformers, and LLM-based architectures (BERT)—tailored for automating and enhancing earthwork quantity estimation from CAD-based cross-sectional drawings.

### Key Features:
1. **Multi-Model Approach**:
   - **MLP-Based Model**: Lightweight and efficient for smaller datasets.
   - **LSTM-Based Model**: Ideal for sequential dependencies in time-series data.
   - **Transformer-Based Model**: Handles complex relationships and large datasets.
   - **LLM-Based Model (BERT)**: Excels in processing contextual and unstructured data.
   
2. **Automated Data Processing**:
   - Converts CAD cross-sectional drawings into structured datasets.
   - Employs a Half-Edge Topology Structure to tokenize and preprocess geometrical features.

3. **Enhanced Performance**:
   - Provides superior accuracy in Quantity Takeoff Classification (QTC) with reduced loss metrics.
   - Demonstrates robust generalization for unseen datasets, validated through a real-world road construction project.

### Research Basis:
The ENA is detailed in the paper *Earthwork Network Architecture (ENA): Research for Earthwork Quantity Estimation Method Improvement with Large Language Model*. It showcases a comparative analysis of the ENA models and demonstrates the advantages of LLM-based approaches in construction engineering.

## Usage

### Prerequisites
- **Programming Language**: Python 3.8 or above. PyTorch (torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118)
- **Libraries**: Install the required libraries using `pip install`. Detailed dependencies will be provided in the code files.
   ```bash
   pip install json os re logging torch numpy matplotlib seaborn transformers scikit-learn tqdm
   pip install pandas scipy trimesh laspy open3d pyautocad pywin32
   ```
   
### Data Preparation
1. **Prepare Train Dataset**:
   - Prepare CAD cross-sectional drawings as input files and load it on Autocad. Run the below program to extract the entities per each cross-section in the drawing. In addition, you can define the earthwork item's layer name in config.json. 
   ```bash
   python create_earthwork_dataset.py --config config.json --output output/ --view output/chain_chunk_6.json
   ```
   - In reference, we assume that each earthwork item's layer including entities were segmented(Please refer to the below paper). 
   - Use the provided scripts to preprocess and tokenize geometrical features.
   ```bash
   python prepare_dataset.py --input output/ --output dataset/
   ```

2. **Training Data (TBD)**:
   - Features are tokenized into sequences for MLP, LSTM, Transformers, and LLM models. We'll upload the train source file after arrangement.
   ```bash
   python train_ena_model.py --model_type [MLP|LSTM|Transformer|LLM]
   ```

3. **Run and Test ENA model**:
   - Run the below program to run and test the each ENA model. It will generate log and graph image files to check the performance.
   ```bash
   python ena_run_model.py
   ```     
   
### Training and Evaluation
1. Select the model architecture (`MLP`, `LSTM`, `Transformer`, or `LLM`).
2. Configure hyperparameters (batch size, learning rate, etc.) as required.
3. Run the training script:
   ```bash
   python train_ena_model.py --model_type [MLP|LSTM|Transformer|LLM]
   ```
4. Evaluate the model using the test dataset:
   ```bash
   python evaluate_ena_model.py --model_path [path/to/trained/model]
   ```

## Results
- **Best Model**: LLM-based ENA achieved a QTC accuracy of **97.17%**, outperforming other architectures in accuracy and stability.
- **Performance Trade-Offs**: LLMs provide high accuracy but require significant computational resources compared to other models.

## Coming Soon
- Source codes for ENA models.
- Step-by-step tutorials for dataset preparation and model training.

## License
This project is licensed under the MIT License.

## Citation
If you use this repository, please cite:
Kang, T.; Kang, K. [Earthwork Network Architecture (ENA): Research for Earthwork Quantity Estimation Method Improvement with Large Language Models](https://www.mdpi.com/2076-3417/14/22/10517). Appl. Sci. 2024, 14, 10517.
https://doi.org/10.3390/app142210517