HPAI-BSC
/

Bony

Model card Files Files and versions Community

emilioHugging commited on 26 days ago

Commit

b14fa26

verified ·

1 Parent(s): a25567e

Update model card with wavelets option

Browse files

Files changed (1) hide show

README.md +68 -0

README.md CHANGED Viewed

@@ -76,6 +76,74 @@ The model achieved a classification accuracy of **81%** on the PANDA subset and
 | **Hibou**        | **83.1%**                       | 1.455e-06           | 0.10            |
 | **Histoencoder** | 81.6%                           | **1.003e-06**       | -               |
 ## Limitations and Biases
 Although this model was trained for a specific prostate histopathology analysis task, there are several limitations and biases:
 - Performance may be affected by the quality of input images, particularly in cases of low resolution or noise.

 | **Hibou**        | **83.1%**                       | 1.455e-06           | 0.10            |
 | **Histoencoder** | 81.6%                           | **1.003e-06**       | -               |
+## Wavelet Decomposition
+As previously mentioned, histopathology images are highly discontinuous, noisy, and often visually similar. Therefore, applying a filter to these images might help abstract their information, enabling more stable and potentially more effective training. This is why I believe that incorporating wavelet decomposition before the forward pass in our XCiT model could be a promising approach.
+### Overview of 3D Wavelet Decomposition
+3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
+Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. The 3D wavelet transform is defined as:
+\[
+W_\psi f(j, \theta, x, y, z) = f \ast \psi_{j, \theta}(x, y, z),
+\]
+where \( \psi_{j, \theta} \) is a 3D wavelet with:
+- \( j \): a scale defining the spatial resolution,
+- \( \theta \): a specific spatial orientation,
+- \( \ast \): the 3D convolution operator.
+Common 3D wavelets include Morlet and Haar wavelets, which are effective for capturing directional variations.
+### 3D Scattering: Invariant Extension
+3D scattering is a method related to wavelet decomposition that produces representations invariant to transformations (e.g., translation, rotation). This ensures that histopathology images are invariant in the wavelet coefficient domain, thereby enabling better generalization.
+#### Step 1: Wavelet Decomposition
+A 3D wavelet is applied to extract first-scale coefficients:
+\[
+U_1(x, y, z) = |f \ast \psi_{j_1, \theta_1}(x, y, z)|.
+\]
+#### Step 2: Higher-Level Coefficient Extraction
+The coefficients \( U_1 \) are further transformed to capture secondary information:
+\[
+U_2(x, y, z) = |U_1 \ast \psi_{j_2, \theta_2}(x, y, z)|.
+\]
+This process can be repeated across multiple levels \( m \), forming a hierarchical cascade.
+It is worth noting that these wavelet operations share similarities with CNNs, where convolution layers are applied. This highlights that wavelet decomposition is foundational to computer vision based on CNNs.
+#### Step 3: Invariant Aggregation
+At each level, a non-linear operator is applied to create invariant representations (the following is an example of such an operation):
+\[
+S_m = \int |U_m| \, dx \, dy \, dz.
+\]
+These \( S_m \) coefficients can then be used for downstream tasks.
+Having introduced this idea, further testing is needed.
+### Testing the Idea
+We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image.
+Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test.
 ## Limitations and Biases
 Although this model was trained for a specific prostate histopathology analysis task, there are several limitations and biases:
 - Performance may be affected by the quality of input images, particularly in cases of low resolution or noise.