Update README.md
Browse files
README.md
CHANGED
@@ -52,11 +52,11 @@ output # 'CN1CCC=C(CO)C1'
|
|
52 |
### Training Procedure
|
53 |
|
54 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
55 |
-
We used the Open Reaction Database (ORD) dataset for model training.
|
56 |
-
The command used for training is the following. For more information, please refer to the paper and GitHub repository.
|
57 |
|
58 |
```python
|
59 |
-
python
|
60 |
--model='t5' \
|
61 |
--epochs=100 \
|
62 |
--lr=1e-3 \
|
@@ -67,12 +67,12 @@ python train_without_duplicates.py \
|
|
67 |
--evaluation_strategy='epoch' \
|
68 |
--save_strategy='epoch' \
|
69 |
--logging_strategy='epoch' \
|
70 |
-
--train_data_path='
|
71 |
-
--valid_data_path='
|
72 |
-
--test_data_path='
|
73 |
-
--USPTO_test_data_path='
|
74 |
--disable_tqdm \
|
75 |
-
--pretrained_model_name_or_path='sagawa/
|
76 |
```
|
77 |
|
78 |
### Results
|
|
|
52 |
### Training Procedure
|
53 |
|
54 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
55 |
+
We used the [Open Reaction Database (ORD) dataset](https://drive.google.com/file/d/1fa2MyLdN1vcA7Rysk8kLQENE92YejS9B/view?usp=drive_link) for model training. In addition, we used [USPTO_MIT dataset](https://yzhang.hpc.nyu.edu/T5Chem/index.html)'s test split to prevent data leakage.
|
56 |
+
The command used for training is the following. For more information about data preprocessing and training, please refer to the paper and GitHub repository.
|
57 |
|
58 |
```python
|
59 |
+
python train.py \
|
60 |
--model='t5' \
|
61 |
--epochs=100 \
|
62 |
--lr=1e-3 \
|
|
|
67 |
--evaluation_strategy='epoch' \
|
68 |
--save_strategy='epoch' \
|
69 |
--logging_strategy='epoch' \
|
70 |
+
--train_data_path='../data/preprocessed_ord_train.csv' \
|
71 |
+
--valid_data_path='../data/preprocessed_ord_valid.csv' \
|
72 |
+
--test_data_path='../data/preprocessed_ord_test.csv' \
|
73 |
+
--USPTO_test_data_path='../data/USPTO_MIT/MIT_separated/test.csv' \
|
74 |
--disable_tqdm \
|
75 |
+
--pretrained_model_name_or_path='sagawa/CompoundT5'
|
76 |
```
|
77 |
|
78 |
### Results
|