Sophia Tang commited on
Commit
b72ad57
·
1 Parent(s): 08c1a46

env update

Browse files
.gitignore CHANGED
@@ -6,6 +6,7 @@ __pycache__/
6
  tr2d2-pep/wandb/
7
  tr2d2-pep/pretrained/
8
  tr2d2-pep/logs/
 
9
  tr2d2-pep/__pycache__/
10
  tr2d2-pep/scoring/__pycache__/
11
  tr2d2-pep/tokenizer/__pycache__/
@@ -15,4 +16,5 @@ tr2d2-pep/utils/__pycache__/
15
  *.ipynb
16
  *.pt
17
  tr2d2-pep/scoring/functions/classifiers/best_model.pt
18
- .gitignore
 
 
6
  tr2d2-pep/wandb/
7
  tr2d2-pep/pretrained/
8
  tr2d2-pep/logs/
9
+ tr2d2-pep/results/
10
  tr2d2-pep/__pycache__/
11
  tr2d2-pep/scoring/__pycache__/
12
  tr2d2-pep/tokenizer/__pycache__/
 
16
  *.ipynb
17
  *.pt
18
  tr2d2-pep/scoring/functions/classifiers/best_model.pt
19
+ .gitignore
20
+ .gitattributes
README.md CHANGED
@@ -6,25 +6,21 @@
6
 
7
  ![TR2-D2](assets/tr2d2-anim.gif)
8
 
9
- This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://github.com/programmablebio/peptune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
10
 
11
  Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
12
 
13
- 🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing sampling trajectories in a replay buffer for repeated use.
14
 
15
- 🌳 Tree search balances exploration and exploitation to generate optimal diffusion trajectories, and stores the optimal sequences in the buffer.
16
 
17
  We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
18
 
19
- ### Regulatory DNA Sequence Design 🧬
20
-
21
- ---
22
 
23
  In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
24
 
25
- ### Multi-Objective Therapeutic Peptide Design 🧫
26
-
27
- ---
28
 
29
  In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.
30
 
 
6
 
7
  ![TR2-D2](assets/tr2d2-anim.gif)
8
 
9
+ This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://huggingface.co/ChatterjeeLab/PepTune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
10
 
11
  Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
12
 
13
+ 🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing samples in a replay buffer for repeated use.
14
 
15
+ 🌳 Tree search efficiently explores high-dimensional discrete sequence spaces to find the (often sparse) subspace of high-reward sequences and leverages the structural similarities of optimal sequences to exploit optimal sampling paths in the next iteration.
16
 
17
  We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
18
 
19
+ ## Regulatory DNA Sequence Design 🧬
 
 
20
 
21
  In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
22
 
23
+ ## Multi-Objective Therapeutic Peptide Design 🧫
 
 
24
 
25
  In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.
26
 
tr2d2-pep/environment.yml CHANGED
@@ -17,5 +17,18 @@ dependencies:
17
  - pytorch-cuda=12.4
18
  - pip:
19
  - pytorch-lightning==2.5.5
 
20
  - fair-esm==2.0.0
21
- - transformers==4.56.2
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  - pytorch-cuda=12.4
18
  - pip:
19
  - pytorch-lightning==2.5.5
20
+ - lightning==2.5.5
21
  - fair-esm==2.0.0
22
+ - transformers==4.56.2
23
+ - SmilesPE==0.0.3
24
+ - scipy==1.13.1
25
+ - wandb==0.22.0
26
+ - hydra-core==1.3.2
27
+ - hydra-submitit-launcher==1.2.0
28
+ - pathos==0.3.4
29
+ - matplotlib==3.10.1
30
+ - pandas==2.2.2
31
+ - seaborn==0.13.2
32
+ - timm==1.0.20
33
+ - xgboost==3.0.5
34
+ - loguru==0.7.3
tr2d2-pep/finetune.sh CHANGED
@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/TR2-D2/tr2d2-pep
6
  LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
7
  DATE=$(date +%m_%d)
8
  SPECIAL_PREFIX='tr2d2-finetune-tfr'
9
- # set 3 have skip connection
10
  PYTHON_EXECUTABLE=$ENV_PATH/bin/python
11
 
12
  # ===================================================================
@@ -18,7 +17,7 @@ conda activate $ENV_PATH
18
 
19
  $PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
20
  --base_path $HOME_LOC \
21
- --device "cuda:6" \
22
  --noise_removal \
23
  --wdce_num_replicates 16 \
24
  --buffer_size 20 \
 
6
  LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
7
  DATE=$(date +%m_%d)
8
  SPECIAL_PREFIX='tr2d2-finetune-tfr'
 
9
  PYTHON_EXECUTABLE=$ENV_PATH/bin/python
10
 
11
  # ===================================================================
 
17
 
18
  $PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
19
  --base_path $HOME_LOC \
20
+ --device "cuda:0" \
21
  --noise_removal \
22
  --wdce_num_replicates 16 \
23
  --buffer_size 20 \
tr2d2-pep/run_mcts.sh CHANGED
@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/tr2d2/peptides
6
  LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
7
  DATE=$(date +%m_%d)
8
  SPECIAL_PREFIX='tfr-peptune-baseline'
9
- # set 3 have skip connection
10
  PYTHON_EXECUTABLE=$ENV_PATH/bin/python
11
 
12
  # ===================================================================
 
6
  LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
7
  DATE=$(date +%m_%d)
8
  SPECIAL_PREFIX='tfr-peptune-baseline'
 
9
  PYTHON_EXECUTABLE=$ENV_PATH/bin/python
10
 
11
  # ===================================================================