ALPaCA: Adapting Llama for Pathology Context Analysis
Welcome to ALPaCA, a multimodal training framework tailored for slide-level question answering in computational pathology. ALPaCA integrates Llama3.1-8B-Instruct as the language backbone and CONCH as the vision encoder. This repository aims to provide a straightforward reproduction of the ALPaCA framework. The model trained using this framework is named Llama-slideQA.
To run ALPaCA, please first download Llama3.1-8b-instruct as the base model.
For data from TCGA and GTEx, you can visit the GDC Data Portal Homepage and GTEx Portal to download and extract patch features yourself by CONCH. The data processing code is available at https://github.com/ZeyuGaoAi/SMMILe.
Alternatively, you can use the features we have already extracted based on CONCH: CNX-PathLLM/GTEx-TCGA-Embeddings
, CNX-PathLLM/GTEx-TCGA-KMeans-Embeddings
, CNX-PathLLM/GMM_Embeddings
. After downloading, please unzip them into the respective folders for TCGA-Embedding
and GMM_Embedding
.
Please ensure you have access to all the datasets.
After completing all the setups mentioned above and setting up the correct Python environment, you can start the training process using the provided shell script, e.g., run_wsi_stage*.sh
, or follow the instructions in the Train Step section below.
Do not forget to adjust the TCGA and GMM embedding paths to reflect your own file locations.
Settings
Different Aggregate Strategies
You can change aggregate strategies using the --agg_strategy
flag, such as sample
, kmeans
, gmm
, abmil
, qformer
, and longnet
. You can also reproduce the hybrid
method described in our paper by setting --agg_strategy gmm,longnet
in the .sh
script.
Configurable Settings
--vision_adaptor False (vision-query-question interaction)
--vision_adaptor True (vision-query interaction)
--hierarchical_adaptor False (same adaptor for all levels)
--hierarchical_adaptor True (different adaptors for different levels)
Train Step 1
accelerate launch --config_file=./accelerate_configs/deepspeed_zero2.yaml run_wsi.py --learning_rate 1e-4 --max_steps 10000 --warmup_steps 100\
--gpu 2 --train_batch_size 4 --eval_batch_size 2 --max_seq_length 512 \
--agg_strategy gmm,longnet --embed_dim 512 --vision_adaptor False --hierachical_token True --hierachical_adaptor True\
--n_heads 32,16,8 --llm_requires_grad False --resume_from_checkpoint False \
--llm_name /data_local/pxb/LLM_models/llama3/llama3.1-8b-instruct \
--dataset_name_list CNX-PathLLM/TCGA-WSI-Description-4onew,CNX-PathLLM/TCGA-WSI-Description-4omini,CNX-PathLLM/GTEx-WSI-Description \
--data_cache_dir /data_local/pxb/CNX-PathLLM/.cache \
--fea_root /path/to/CNX-PathLLM/GTEx-TCGA-Embeddings \
--gmm_root /path/to/GMM_Embeddings\
--output_dir path/to/output/of/step2
Train Step 2
accelerate launch --config_file=./accelerate_configs/deepspeed_zero2.yaml run_wsi.py --max_steps 20000 --warmup_steps 10\
--gpu 2 --train_batch_size 8 --eval_batch_size 2 --max_seq_length 256 \
--agg_strategy gmm,longnet --embed_dim 512 --vision_adaptor False --hierachical_token True --hierachical_adaptor True\
--n_heads 32,16,8 --llm_requires_grad True --resume_from_checkpoint False \
--llm_name /data_local/pxb/LLM_models/llama3/llama3.1-8b-instruct \
--dataset_name_list CNX-PathLLM/TCGA-WSI-CloseQA-Balanced,CNX-PathLLM/GTEx-WSI-CloseQA-Balanced,CNX-PathLLM/TCGA-WSI-OpenQA,CNX-PathLLM/GTEx-WSI-OpenQA \
--data_cache_dir /data_local/pxb/CNX-PathLLM/.cache \
--fea_root /path/to/CNX-PathLLM/GTEx-TCGA-Embeddings \
--gmm_root /path/to/GMM_Embeddings\
--output_dir path/to/output/of/step2\
--ckpt_path path/to/ckpt.bin/of/step1
Train Step 3
You can continue training (--ckpt_path path/to/ckpt.bin/of/step2) with the specific detailed TCGA-BRCA dataset (CNX-PathLLM/TCGA-BRCA-Details-CloseQA, CNX-PathLLM/TCGA-BRCA-Details-OpenQA
).
You can also continue training (--ckpt_path path/to/ckpt.bin/of/step2) with the morphological description generated by PathChat for TCGA-STAD, TCGA-KIRC and TCGA-OV using CNX-PathLLM/PathChat_CloseQA_Balanced,CNX-PathLLM/PathChat_OpenQA
!
Make sure you can access the dataset and change the above commands with the dataset you want.
Checkpoints:
Llama-slideQA.bin: Trained with general QA following Train Step 2.
Llama-slideQA-morphology.bin: Trained with detailed morphological QA generated by PathChat following Train Step 3.
Llama-slideQA-BRCA.bin: Trained with detailed TCGA-BRCA dataset following Train Step 3.
Test of Step2 General QA
python test_wsi.py --max_seq_length 128 --batch_size 1 --select_data_num -1 --eval_sample_size -1 --n_heads 32,16,8 --llm_name /data_local/pxb/LLM_models/llama3/llama3.1-8b-instruct --vision_adaptor False --hierachical_token True --hierachical_adaptor True \
--shuffle False --data_cache_dir /data_local/pxb/CNX-PathLLM/.cache\
--dataset_name_list CNX-PathLLM/TCGA-WSI-CloseQA-Balanced,CNX-PathLLM/GTEx-WSI-CloseQA-Balanced,CNX-PathLLM/TCGA-WSI-OpenQA,CNX-PathLLM/GTEx-WSI-OpenQA\
--agg_strategy gmm,longnet --embed_dim 512\
--fea_root /path/to/CNX-PathLLM/GTEx-TCGA-Embeddings \
--gmm_root /path/to/GMM_Embeddings\
--ckpt_path path/to/ckpt.bin/of/step2\
--results_save_path /path/to/the/output.csv\
--use_peft False
Test of Step3 General QA
python test_wsi.py --max_seq_length 128 --batch_size 1 --select_data_num -1 --eval_sample_size -1 --n_heads 32,16,8 --llm_name /data_local/pxb/LLM_models/llama3/llama3.1-8b-instruct --vision_adaptor False --hierachical_token True --hierachical_adaptor True \
--shuffle False --data_cache_dir /data_local/pxb/CNX-PathLLM/.cache\
--dataset_name_list CNX-PathLLM/TCGA-BRCA-Details-CloseQA,CNX-PathLLM/TCGA-BRCA-Details-OpenQA (CNX-PathLLM/PathChat_CloseQA_Balanced,CNX-PathLLM/PathChat_OpenQA)\
--agg_strategy gmm,longnet --embed_dim 512\
--fea_root /path/to/CNX-PathLLM/GTEx-TCGA-Embeddings \
--gmm_root /path/to/GMM_Embeddings\
--ckpt_path path/to/ckpt.bin/of/step2\
--results_save_path /path/to/the/output.csv\
--use_peft False
Disclaimer
This repository and all associated models are intended solely for academic research and non-commercial use. The model involves medical data (e.g., TCGA, GTEx) and pathology-related tasks, but is not approved for clinical diagnosis or medical decision-making. The developers are not responsible for any misuse of this code or model in medical or commercial contexts.
License
This model is developed using Meta’s LLaMA 3 model as part of its architecture. Following the LLaMA 3.1 License.
Model tree for CNX-PathLLM/Llama-slideQA
Base model
meta-llama/Llama-3.1-8B