先坤 commited on
Commit
f407e0e
1 Parent(s): 9ee1a60
README.md CHANGED
@@ -7,9 +7,11 @@ tags:
7
  - Reinforcement Learning
8
  - Vehicle Routing Problem
9
  ---
10
- ![LOGO.png](./images/GREEDRL-Logo-Original-320x320.png)
 
 
11
 
12
- # GreedRL
13
 
14
  # Introduction
15
 
@@ -17,32 +19,130 @@ tags:
17
  ## Architecture design
18
  The entire architecture is divided into three layers:
19
 
20
- * High-performance Env framework
21
 
22
- The constraints and optimization objectives for the problems to be solved are defined in the RL Env.
23
- Based on performance and ease of use considerations, the Env framework provides two implementations: one based on pytorch and one based on CUDA C++.
24
- To facilitate the definition of problems for developers, the framework abstracts multiple variables to represent the environment's state, which are automatically generated after being declared by the user. When defining constraints and optimization objectives, developers can directly refer to the declared variables.
25
- Currently, various VRP variants such as CVRP, VRPTW and PDPTW, as well as problems such as Batching and Online Assignment, are supported.
26
 
27
- * Pluggable NN components
 
28
 
29
- The framework provides certain neural network components, and developers can also implement custom neural network components.
30
 
31
- * High-performance NN operators
32
 
33
- In order to achieve the ultimate performance, the framework implements some high-performance operators specifically for OR scenarios to replace pytorch operators, such as masked addition attention and masked softmax sampling."
34
 
35
- ![Architecture](./images/GREEDRL-Framwork.png)
 
 
 
 
 
 
 
36
 
37
  ## Network design
38
- The network structure adopts the seq2seq architecture commonly used in NLP, with the Transformer used in the encoding part and RNN used in the decoding part, as shown in the diagram below.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
- ![network.png](./images/GREEDRL-Network.png)
41
 
42
  ## 🏆Award
43
 
44
 
45
- # GreedRL-VRP-pretrained model
46
 
47
  ## Model description
48
 
@@ -53,9 +153,7 @@ The network structure adopts the seq2seq architecture commonly used in NLP, with
53
 
54
  You can use these model for solving the vehicle routing problems (VRPs) with reinforcement learning (RL).
55
 
56
- ## Usage
57
-
58
- You can use this model directly with a pipeline for masked language modeling:
59
 
60
  ### Requirements
61
  This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest/miniconda.html#system-requirements) / [Anaconda](https://docs.anaconda.com/anaconda/install/) is our recommended Python distribution.
@@ -64,25 +162,41 @@ This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest
64
  pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
65
  ```
66
 
 
 
 
 
 
 
 
 
67
 
68
  ### Training
69
 
 
 
70
  1. Training data
71
 
72
  We use the generated data for the training phase, the customers and depot locations are randomly generated in the unit square [0,1] X [0,1].
73
 
74
- For the Capacitated VRP(CVRP), we assume that the demand of each node is a discrete number in {1,...,9}, chosen uniformly at random.
75
 
76
 
77
  2. Start training
78
  ```python
79
- python train.py
 
 
80
  ```
81
 
82
  ### Evaluation
83
 
 
 
84
  ```python
85
- python solve.py
 
 
86
  ```
87
 
88
  ## Support
 
7
  - Reinforcement Learning
8
  - Vehicle Routing Problem
9
  ---
10
+ <div align="center">
11
+ <img src="./images/GREEDRL-Logo-Original-640.png" width = "515" height = "380"/>
12
+ </div>
13
 
14
+ # ✊‍GreedRL
15
 
16
  # Introduction
17
 
 
19
  ## Architecture design
20
  The entire architecture is divided into three layers:
21
 
22
+ * **High-performance Env framework**
23
 
24
+ The constraints and optimization objectives for the problems to be solved are defined in the Reinforcement Learning(RL) Environment(Env).
25
+ Based on performance and ease of use considerations, the Env framework provides two implementations:one based on **pytorch** and one based on **CUDA C++**.
 
 
26
 
27
+ To facilitate the definition of problems for developers, the framework abstracts multiple variables to represent the environment's state, which are automatically generated after being declared by the user. When defining constraints and optimization objectives, developers can directly refer to the declared variables.
28
+ Currently, various VRP variants such as CVRP, VRPTW and PDPTW, as well as problems such as Batching, are supported.
29
 
 
30
 
31
+ * **Pluggable NN components**
32
 
33
+ The framework provides certain neural network(NN) components, and developers can also implement custom neural network components.
34
 
35
+
36
+ * **High-performance NN operators**
37
+
38
+ In order to achieve the ultimate performance, the framework implements some high-performance operators specifically for Combinatorial Optimization(CO) problems to replace pytorch operators, such as the Masked Addition Attention and Masked Softmax Sampling."
39
+
40
+ <div align="center">
41
+ <img src="./images/GREEDRL-Framwork.png" width = "515" height = "380"/>
42
+ </div>
43
 
44
  ## Network design
45
+ The neural network adopts the Seq2Seq architecture commonly used in Natural Language Processing(NLP), with the Transformer used in the Encoding part and RNN used in the decoding part, as shown in the diagram below.
46
+
47
+ <div align="center">
48
+ <img src="./images/GREEDRL-Network.png" width = "515" height = "380"/>
49
+ </div>
50
+
51
+ ## Modeling examples
52
+
53
+ ### VRP with Time Windows(VRPTW)
54
+ <details>
55
+ <summary>VRPTW</summary>
56
+
57
+ ```python
58
+ from greedrl import Problem, Solution, Solver
59
+ from greedrl.feature import *
60
+ from greedrl.variable import *
61
+ from greedrl.function import *
62
+ from greedrl.model import runner
63
+ from greedrl.myenv import VrptwEnv
64
+
65
+ features = [continuous_feature('worker_weight_limit'),
66
+ continuous_feature('worker_ready_time'),
67
+ continuous_feature('worker_due_time'),
68
+ continuous_feature('worker_basic_cost'),
69
+ continuous_feature('worker_distance_cost'),
70
+ continuous_feature('task_demand'),
71
+ continuous_feature('task_weight'),
72
+ continuous_feature('task_ready_time'),
73
+ continuous_feature('task_due_time'),
74
+ continuous_feature('task_service_time'),
75
+ continuous_feature('distance_matrix')]
76
+ ```
77
+ </details>
78
+ <details>
79
+
80
+ ```python
81
+ variables = [task_demand_now('task_demand_now', feature='task_demand'),
82
+ task_demand_now('task_demand_this', feature='task_demand', only_this=True),
83
+ feature_variable('task_weight'),
84
+ feature_variable('task_due_time'),
85
+ feature_variable('task_ready_time'),
86
+ feature_variable('task_service_time'),
87
+ worker_variable('worker_weight_limit'),
88
+ worker_variable('worker_due_time'),
89
+ worker_variable('worker_basic_cost'),
90
+ worker_variable('worker_distance_cost'),
91
+ worker_used_resource('worker_used_weight', task_require='task_weight'),
92
+ worker_used_resource('worker_used_time', 'distance_matrix', 'task_service_time', 'task_ready_time',
93
+ 'worker_ready_time'),
94
+ edge_variable('distance_last_to_this', feature='distance_matrix', last_to_this=True),
95
+ edge_variable('distance_this_to_task', feature='distance_matrix', this_to_task=True),
96
+ edge_variable('distance_task_to_end', feature='distance_matrix', task_to_end=True)]
97
+
98
+
99
+ class Constraint:
100
+
101
+ def do_task(self):
102
+ return self.task_demand_this
103
+
104
+ def mask_task(self):
105
+ # 已经完成的任务
106
+ mask = self.task_demand_now <= 0
107
+ # 车辆容量限制
108
+ worker_weight_limit = self.worker_weight_limit - self.worker_used_weight
109
+ mask |= self.task_demand_now * self.task_weight > worker_weight_limit[:, None]
110
+
111
+ worker_used_time = self.worker_used_time[:, None] + self.distance_this_to_task
112
+ mask |= worker_used_time > self.task_due_time
113
+
114
+ worker_used_time = torch.max(worker_used_time, self.task_ready_time)
115
+ worker_used_time += self.task_service_time
116
+ worker_used_time += self.distance_task_to_end
117
+ mask |= worker_used_time > self.worker_due_time[:, None]
118
+
119
+ return mask
120
+
121
+ def finished(self):
122
+ return torch.all(self.task_demand_now <= 0, 1)
123
+
124
+
125
+ class Objective:
126
+
127
+ def step_worker_start(self):
128
+ return self.worker_basic_cost
129
+
130
+ def step_worker_end(self):
131
+ return self.distance_last_to_this * self.worker_distance_cost
132
+
133
+ def step_task(self):
134
+ return self.distance_last_to_this * self.worker_distance_cost
135
+ ```
136
+
137
+ </details>
138
+
139
+ ### Pickup and Delivery Problem with Time Windows(PDPTW)
140
 
 
141
 
142
  ## 🏆Award
143
 
144
 
145
+ # 🤠GreedRL-VRP-pretrained model
146
 
147
  ## Model description
148
 
 
153
 
154
  You can use these model for solving the vehicle routing problems (VRPs) with reinforcement learning (RL).
155
 
156
+ ## How to use
 
 
157
 
158
  ### Requirements
159
  This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest/miniconda.html#system-requirements) / [Anaconda](https://docs.anaconda.com/anaconda/install/) is our recommended Python distribution.
 
162
  pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
163
  ```
164
 
165
+ You need to compile first and add the resulting library `greedrl_c` to the `PYTHONPATH`
166
+
167
+ ```aidl
168
+ python setup.py build
169
+
170
+ export PYTHONPATH={root_path}/greedrl/build/lib.linux-x86_64-cpython-38/
171
+ ```
172
+
173
 
174
  ### Training
175
 
176
+ We provide examples of Capacitated VRP(CVRP) for training and inference.
177
+
178
  1. Training data
179
 
180
  We use the generated data for the training phase, the customers and depot locations are randomly generated in the unit square [0,1] X [0,1].
181
 
182
+ For the CVRP, we assume that the demand of each node is a discrete number in {1,...,9}, chosen uniformly at random.
183
 
184
 
185
  2. Start training
186
  ```python
187
+ cd examples/cvrp
188
+
189
+ python train.py --model_filename cvrp_5000.pt --problem_size 5000
190
  ```
191
 
192
  ### Evaluation
193
 
194
+ We provide some pretrained models for different CVRP problem sizes, such as `cvrp_100`, `cvrp_1000`, `cvrp_2000` and `cvrp_5000`, that you can directly use for inference.
195
+
196
  ```python
197
+ cd examples/cvrp
198
+
199
+ python solve.py --device cuda --model_name cvrp_5000.pt --problem_size 5000
200
  ```
201
 
202
  ## Support
images/GREEDRL-Logo-Original-320x320.png DELETED
Binary file (17.4 kB)
 
images/GREEDRL-Logo-Original-640.png ADDED