metadata

license: other
language:
  - en
tags:
  - histology
  - pathology
  - vision
  - pytorch
  - self-supervised
  - vit
metrics:
  - accuracy
  - roc_auc
  - f1
pipeline_tag: image-feature-extraction
library_name: transformers

Model Card for Phikon-v2

Phikon-v2 is a Vision Transformer Large pre-trained with Dinov2 self-supervised method on PANCAN-XL, a dataset of 450M 20x magnification histology images sampled from 60K whole slide images. PANCAN-XL only incorporates publicly available datasets: CPTAC (6,193 WSI) and TCGA (29,502 WSI) for malignant tissue, and gTEX for normal tissue (13,302 WSI).

Phikon-v2 improves upon Phikon, our previous fondation model pre-trained with iBOT on 40M histology images from TCGA (6k WSI), on a large variety of weakly-supervised tasks tailored for biomarker discovery. Phikon-v2 is evaluated on external cohorts to avoid any data contamination with PANCAN-XL pre-training dataset, and benchmarked against an exhaustive panel of representation learning and foundation models.

Model Description

Developed by: Owkin, Inc
Model type: Pretrained vision backbone (ViT-L/16 via DINOv2)
Pretraining dataset: PANCAN-XL, sourced from public histology collections (TCGA, CPTAC, GTEx, TCIA and others).
Paper: to be released
License: Owkin non-commercical licence

How To Use (Feature Extraction)

The following code snippet allows you to extract features from histology images using Phikon-v2 (CLS token). These features can then be used for downstream applications such as ROI classification (via linear or knn probing), slide classification (via multiple instance learning), segmentation (via ViT-Adapter for instance), etc.

from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModel


# Load an image
image = Image.open(
    requests.get(
        "https://github.com/owkin/HistoSSLscaling/blob/main/assets/example.tif?raw=true",
        stream=True
    ).raw
)

# Load phikon-v2
processor = AutoImageProcessor.from_pretrained("owkin/phikon-v2")
model = AutoModel.from_pretrained("owkin/phikon-v2")
model.eval()

# Process the image
inputs = processor(image, return_tensors="pt")

# Get the features
with torch.inference_mode():
    outputs = model(**inputs)
    features = outputs.last_hidden_state[:, 0, :]  # (1, 1024) shape

assert features.shape == (1, 1024)

Direct Use (with Pre-Extracted and Frozen Features)

Phikon-v2 can be used with or without fine-tuning on different downstream applications, on top of which slide-classification using multiple instance learning algorithms (such as ABMIL).

Downstream Use (Finetuning)

You can fine-tune the model on tile-level downstream tasks. This Colab notebook allows you to fine-tune Phikon and Phikon-v2 using LoRa through the huggingface API.

Training Details

Training data: PANCAN-XL, a pretraining dataset composed of 456,060,584 [224×224] histology images at 20× resolution, sampled from 60k H&E WSIs.
Training regime: fp16 using PyTorch-FSDP mixed-precision.
Training objective: DINOv2 SSL recipe with the following losses:
- DINO self-distillation loss with multi-crop
- iBOT masked-image modeling loss
- KoLeo regularization on [CLS] tokens
Training length: 100,000 iterations with a batch size of 4,096
Model architecture: ViT-Large (0.3B params): Patch size 16, embedding dimension 1024, 16 heads, MLP FFN
Hardware used: 32x4 Nvidia V100 32GB
Hours trained ??: Approx 4,300 GPU hours (33 hours total)
Platform: French supercluster Jean-Zay

Software Dependencies

Python Packages

torch>==2.0.0: https://pytorch.org
torchvision>=0.15.0: https://pytorch.org/vision/stable/index.html
xformers>=0.0.18: https://github.com/facebookresearch/xformers

Repositories

DINOv2 (self-supervised learning): https://github.com/facebookresearch/dinov2

Contact

For any additional questions or comments, contact Alexandre Filiot (alexandre.filiot@owkwin.com).

Acknowledgements

We thank DINOv2 authors for the amazing contribution. This work was granted access to the HPC resources of IDRIS under the allocation 2023-A0141012519 made by GENCI.