env update

Files changed (5) hide show

.gitignore +3 -1
README.md +5 -9
tr2d2-pep/environment.yml +14 -1
tr2d2-pep/finetune.sh +1 -2
tr2d2-pep/run_mcts.sh +0 -1

.gitignore CHANGED Viewed

@@ -6,6 +6,7 @@ __pycache__/
 tr2d2-pep/wandb/
 tr2d2-pep/pretrained/
 tr2d2-pep/logs/
 tr2d2-pep/__pycache__/
 tr2d2-pep/scoring/__pycache__/
 tr2d2-pep/tokenizer/__pycache__/
@@ -15,4 +16,5 @@ tr2d2-pep/utils/__pycache__/
 *.ipynb
 *.pt
 tr2d2-pep/scoring/functions/classifiers/best_model.pt
-.gitignore

 tr2d2-pep/wandb/
 tr2d2-pep/pretrained/
 tr2d2-pep/logs/
+tr2d2-pep/results/
 tr2d2-pep/__pycache__/
 tr2d2-pep/scoring/__pycache__/
 tr2d2-pep/tokenizer/__pycache__/
 *.ipynb
 *.pt
 tr2d2-pep/scoring/functions/classifiers/best_model.pt
+.gitignore
+.gitattributes

README.md CHANGED Viewed

@@ -6,25 +6,21 @@
 ![TR2-D2](assets/tr2d2-anim.gif)
-This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://github.com/programmablebio/peptune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
 Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
-🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing sampling trajectories in a replay buffer for repeated use.
-🌳 Tree search balances exploration and exploitation to generate optimal diffusion trajectories, and stores the optimal sequences in the buffer.
 We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
-### Regulatory DNA Sequence Design 🧬
----
 In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
-### Multi-Objective Therapeutic Peptide Design 🧫
----
 In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.

 ![TR2-D2](assets/tr2d2-anim.gif)
+This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://huggingface.co/ChatterjeeLab/PepTune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
 Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
+🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing samples in a replay buffer for repeated use.
+🌳 Tree search efficiently explores high-dimensional discrete sequence spaces to find the (often sparse) subspace of high-reward sequences and leverages the structural similarities of optimal sequences to exploit optimal sampling paths in the next iteration.
 We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
+## Regulatory DNA Sequence Design 🧬
 In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
+## Multi-Objective Therapeutic Peptide Design 🧫
 In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.

tr2d2-pep/environment.yml CHANGED Viewed

@@ -17,5 +17,18 @@ dependencies:
   - pytorch-cuda=12.4
   - pip:
       - pytorch-lightning==2.5.5
       - fair-esm==2.0.0
-      - transformers==4.56.2

   - pytorch-cuda=12.4
   - pip:
       - pytorch-lightning==2.5.5
+      - lightning==2.5.5
       - fair-esm==2.0.0
+      - transformers==4.56.2
+      - SmilesPE==0.0.3
+      - scipy==1.13.1
+      - wandb==0.22.0
+      - hydra-core==1.3.2
+      - hydra-submitit-launcher==1.2.0
+      - pathos==0.3.4
+      - matplotlib==3.10.1
+      - pandas==2.2.2
+      - seaborn==0.13.2
+      - timm==1.0.20
+      - xgboost==3.0.5
+      - loguru==0.7.3

tr2d2-pep/finetune.sh CHANGED Viewed

@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/TR2-D2/tr2d2-pep
 LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
 DATE=$(date +%m_%d)
 SPECIAL_PREFIX='tr2d2-finetune-tfr'
-# set 3 have skip connection
 PYTHON_EXECUTABLE=$ENV_PATH/bin/python
 # ===================================================================
@@ -18,7 +17,7 @@ conda activate $ENV_PATH
 $PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
     --base_path $HOME_LOC \
-    --device "cuda:6" \
     --noise_removal \
     --wdce_num_replicates 16 \
     --buffer_size 20 \

 LOG_LOC=$HOME_LOC/TR2-D2/tr2d2-pep/logs
 DATE=$(date +%m_%d)
 SPECIAL_PREFIX='tr2d2-finetune-tfr'
 PYTHON_EXECUTABLE=$ENV_PATH/bin/python
 # ===================================================================
 $PYTHON_EXECUTABLE $SCRIPT_LOC/finetune.py \
     --base_path $HOME_LOC \
+    --device "cuda:0" \
     --noise_removal \
     --wdce_num_replicates 16 \
     --buffer_size 20 \

tr2d2-pep/run_mcts.sh CHANGED Viewed

@@ -6,7 +6,6 @@ SCRIPT_LOC=$HOME_LOC/tr2d2/peptides
 LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
 DATE=$(date +%m_%d)
 SPECIAL_PREFIX='tfr-peptune-baseline'
-# set 3 have skip connection
 PYTHON_EXECUTABLE=$ENV_PATH/bin/python
 # ===================================================================

 LOG_LOC=$HOME_LOC/tr2d2/peptides/logs
 DATE=$(date +%m_%d)
 SPECIAL_PREFIX='tfr-peptune-baseline'
 PYTHON_EXECUTABLE=$ENV_PATH/bin/python
 # ===================================================================