Sophia Tang
commited on
Commit
·
b72ad57
1
Parent(s):
08c1a46
env update
Browse files- .gitignore +3 -1
- README.md +5 -9
- tr2d2-pep/environment.yml +14 -1
- tr2d2-pep/finetune.sh +1 -2
- tr2d2-pep/run_mcts.sh +0 -1
.gitignore
CHANGED
|
@@ -6,6 +6,7 @@ __pycache__/
|
|
| 6 |
tr2d2-pep/wandb/
|
| 7 |
tr2d2-pep/pretrained/
|
| 8 |
tr2d2-pep/logs/
|
|
|
|
| 9 |
tr2d2-pep/__pycache__/
|
| 10 |
tr2d2-pep/scoring/__pycache__/
|
| 11 |
tr2d2-pep/tokenizer/__pycache__/
|
|
@@ -15,4 +16,5 @@ tr2d2-pep/utils/__pycache__/
|
|
| 15 |
*.ipynb
|
| 16 |
*.pt
|
| 17 |
tr2d2-pep/scoring/functions/classifiers/best_model.pt
|
| 18 |
-
.gitignore
|
|
|
|
|
|
| 6 |
tr2d2-pep/wandb/
|
| 7 |
tr2d2-pep/pretrained/
|
| 8 |
tr2d2-pep/logs/
|
| 9 |
+
tr2d2-pep/results/
|
| 10 |
tr2d2-pep/__pycache__/
|
| 11 |
tr2d2-pep/scoring/__pycache__/
|
| 12 |
tr2d2-pep/tokenizer/__pycache__/
|
|
|
|
| 16 |
*.ipynb
|
| 17 |
*.pt
|
| 18 |
tr2d2-pep/scoring/functions/classifiers/best_model.pt
|
| 19 |
+
.gitignore
|
| 20 |
+
.gitattributes
|
README.md
CHANGED
|
@@ -6,25 +6,21 @@
|
|
| 6 |
|
| 7 |

|
| 8 |
|
| 9 |
-
This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://
|
| 10 |
|
| 11 |
Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
|
| 12 |
|
| 13 |
-
🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing
|
| 14 |
|
| 15 |
-
🌳 Tree search
|
| 16 |
|
| 17 |
We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
---
|
| 22 |
|
| 23 |
In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
|
| 29 |
In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.
|
| 30 |
|
|
|
|
| 6 |
|
| 7 |

|
| 8 |
|
| 9 |
+
This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://huggingface.co/ChatterjeeLab/PepTune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
|
| 10 |
|
| 11 |
Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
|
| 12 |
|
| 13 |
+
🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing samples in a replay buffer for repeated use.
|
| 14 |
|
| 15 |
+
🌳 Tree search efficiently explores high-dimensional discrete sequence spaces to find the (often sparse) subspace of high-reward sequences and leverages the structural similarities of optimal sequences to exploit optimal sampling paths in the next iteration.
|
| 16 |
|
| 17 |
We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
|
| 18 |
|
| 19 |
+
## Regulatory DNA Sequence Design 🧬
|
|
|
|
|
|
|
| 20 |
|
| 21 |
In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
|
| 22 |
|
| 23 |
+
## Multi-Objective Therapeutic Peptide Design 🧫
|
|
|
|
|
|
|
| 24 |
|
| 25 |
In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.
|
| 26 |
|
tr2d2-pep/environment.yml
CHANGED
|
@@ -17,5 +17,18 @@ dependencies:
|
|
| 17 |
- pytorch-cuda=12.4
|
| 18 |
- pip:
|
| 19 |
- pytorch-lightning==2.5.5
|
|
|
|
| 20 |
- fair-esm==2.0.0
|
| 21 |
-
- transformers==4.56.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
- pytorch-cuda=12.4
|
| 18 |
- pip:
|
| 19 |
- pytorch-lightning==2.5.5
|
| 20 |
+
- lightning==2.5.5
|
| 21 |
- fair-esm==2.0.0
|
| 22 |
+
- transformers==4.56.2
|
| 23 |
+
- SmilesPE==0.0.3
|
| 24 |
+
- scipy==1.13.1
|
| 25 |
+
- wandb==0.22.0
|
| 26 |
+
- hydra-core==1.3.2
|
| 27 |
+
- hydra-submitit-launcher==1.2.0
|
| 28 |
+
- pathos==0.3.4
|
| 29 |
+
- matplotlib==3.10.1
|
| 30 |
+
- pandas==2.2.2
|
| 31 |
+
- seaborn==0.13.2
|
| 32 |
+
- timm==1.0.20
|
| 33 |
+
- xgboost==3.0.5
|
| 34 |
+
- loguru==0.7.3
|
tr2d2-pep/finetune.sh
CHANGED
|
@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/TR2-D2/tr2d2-pep
|
|
| 6 |
LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
|
| 7 |
DATE=$(date +%m_%d)
|
| 8 |
SPECIAL_PREFIX='tr2d2-finetune-tfr'
|
| 9 |
-
# set 3 have skip connection
|
| 10 |
PYTHON_EXECUTABLE=$ENV_PATH/bin/python
|
| 11 |
|
| 12 |
# ===================================================================
|
|
@@ -18,7 +17,7 @@ conda activate $ENV_PATH
|
|
| 18 |
|
| 19 |
$PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
|
| 20 |
--base_path $HOME_LOC \
|
| 21 |
-
--device "cuda:
|
| 22 |
--noise_removal \
|
| 23 |
--wdce_num_replicates 16 \
|
| 24 |
--buffer_size 20 \
|
|
|
|
| 6 |
LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
|
| 7 |
DATE=$(date +%m_%d)
|
| 8 |
SPECIAL_PREFIX='tr2d2-finetune-tfr'
|
|
|
|
| 9 |
PYTHON_EXECUTABLE=$ENV_PATH/bin/python
|
| 10 |
|
| 11 |
# ===================================================================
|
|
|
|
| 17 |
|
| 18 |
$PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
|
| 19 |
--base_path $HOME_LOC \
|
| 20 |
+
--device "cuda:0" \
|
| 21 |
--noise_removal \
|
| 22 |
--wdce_num_replicates 16 \
|
| 23 |
--buffer_size 20 \
|
tr2d2-pep/run_mcts.sh
CHANGED
|
@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/tr2d2/peptides
|
|
| 6 |
LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
|
| 7 |
DATE=$(date +%m_%d)
|
| 8 |
SPECIAL_PREFIX='tfr-peptune-baseline'
|
| 9 |
-
# set 3 have skip connection
|
| 10 |
PYTHON_EXECUTABLE=$ENV_PATH/bin/python
|
| 11 |
|
| 12 |
# ===================================================================
|
|
|
|
| 6 |
LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
|
| 7 |
DATE=$(date +%m_%d)
|
| 8 |
SPECIAL_PREFIX='tfr-peptune-baseline'
|
|
|
|
| 9 |
PYTHON_EXECUTABLE=$ENV_PATH/bin/python
|
| 10 |
|
| 11 |
# ===================================================================
|