license: other
language:
- en
tags:
- histology
- pathology
- vision
- pytorch
- self-supervised
- vit
metrics:
- accuracy
- roc_auc
- f1
pipeline_tag: image-feature-extraction
library_name: transformers
Model Card for Phikon-v2
Phikon-v2 is a Vision Transformer Large pre-trained with Dinov2 self-supervised method on PANCAN-XL, a dataset of 450M 20x magnification histology images sampled from 60K whole slide images. PANCAN-XL only incorporates publicly available datasets: CPTAC (6,193 WSI) and TCGA (29,502 WSI) for malignant tissue, and gTEX for normal tissue (13,302 WSI).
Phikon-v2 improves upon Phikon, our previous fondation model pre-trained with iBOT on 40M histology images from TCGA (6k WSI), on a large variety of weakly-supervised tasks tailored for biomarker discovery. Phikon-v2 is evaluated on external cohorts to avoid any data contamination with PANCAN-XL pre-training dataset, and benchmarked against an exhaustive panel of representation learning and foundation models.
Model Description
- Developed by: Owkin, Inc
- Model type: Pretrained vision backbone (ViT-L/16 via DINOv2)
- Pretraining dataset: PANCAN-XL, sourced from public histology collections (TCGA, CPTAC, GTEx, TCIA and others).
- Paper: to be released
- License: Owkin non-commercical licence
How To Use (Feature Extraction)
The following code snippet allows you to extract features from histology images using Phikon-v2 (CLS token). These features can then be used for downstream applications such as ROI classification (via linear or knn probing), slide classification (via multiple instance learning), segmentation (via ViT-Adapter for instance), etc.
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModel
# Load an image
image = Image.open(
requests.get(
"https://github.com/owkin/HistoSSLscaling/blob/main/assets/example.tif?raw=true",
stream=True
).raw
)
# Load phikon-v2
processor = AutoImageProcessor.from_pretrained("owkin/phikon-v2")
model = AutoModel.from_pretrained("owkin/phikon-v2")
model.eval()
# Process the image
inputs = processor(image, return_tensors="pt")
# Get the features
with torch.inference_mode():
outputs = model(**inputs)
features = outputs.last_hidden_state[:, 0, :] # (1, 1024) shape
assert features.shape == (1, 1024)
Direct Use (with Pre-Extracted and Frozen Features)
Phikon-v2 can be used with or without fine-tuning on different downstream applications, on top of which slide-classification using multiple instance learning algorithms (such as ABMIL).
Downstream Use (Finetuning)
You can fine-tune the model on tile-level downstream tasks. This Colab notebook allows you to fine-tune Phikon and Phikon-v2 using LoRa through the huggingface API.
Training Details
- Training data: PANCAN-XL, a pretraining dataset composed of 456,060,584 [224×224] histology images at 20× resolution, sampled from 60k H&E WSIs.
- Training regime: fp16 using PyTorch-FSDP mixed-precision.
- Training objective: DINOv2 SSL recipe with the following losses:
- DINO self-distillation loss with multi-crop
- iBOT masked-image modeling loss
- KoLeo regularization on [CLS] tokens
- Training length: 100,000 iterations with a batch size of 4,096
- Model architecture: ViT-Large (0.3B params): Patch size 16, embedding dimension 1024, 16 heads, MLP FFN
- Hardware used: 32x4 Nvidia V100 32GB
- Hours trained ??: Approx 4,300 GPU hours (33 hours total)
- Platform: French supercluster Jean-Zay
Software Dependencies
Python Packages
- torch>==2.0.0: https://pytorch.org
- torchvision>=0.15.0: https://pytorch.org/vision/stable/index.html
- xformers>=0.0.18: https://github.com/facebookresearch/xformers
Repositories
- DINOv2 (self-supervised learning): https://github.com/facebookresearch/dinov2
Contact
For any additional questions or comments, contact Alexandre Filiot (alexandre.filiot@owkwin.com
).
Acknowledgements
We thank DINOv2 authors for the amazing contribution. This work was granted access to the HPC resources of IDRIS under the allocation 2023-A0141012519 made by GENCI.