Paolo-Fraccaro
commited on
Commit
•
e4a3640
1
Parent(s):
66a6937
Update README.md
Browse files
README.md
CHANGED
@@ -4,25 +4,19 @@ tags:
|
|
4 |
- Pytorch
|
5 |
- Geospatial
|
6 |
- Temporal ViT
|
|
|
7 |
---
|
8 |
|
9 |
-
|
10 |
-
|
11 |
|
|
|
12 |
|
13 |
-
### Model and Input
|
14 |
The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most
|
15 |
other works around remote sensing modeling. Being able to handle a time series of remote sensing images can be very helpful to a variety of downstream tasks. The model can also handle static image which can be simply fed into the model with T=1.
|
16 |
|
17 |
-
### Code
|
18 |
-
The model follows [original mae repo](https://github.com/facebookresearch/mae) with modifications including:
|
19 |
-
1. replace 2D patch embed with 3D patch embed
|
20 |
-
2. replace 2D positional embed with 3D positional embed
|
21 |
-
3. replace 2D patchify and unpatchify with 3D
|
22 |
-
4. etc.
|
23 |
-
|
24 |
### Pre-training
|
25 |
-
The model was pre-trained with
|
26 |
|
27 |
* Blue
|
28 |
* Green
|
@@ -31,3 +25,8 @@ The model was pre-trained with Harmonised Landsat and Sentinel 2 data from NASA
|
|
31 |
* SWIR 1
|
32 |
* SWIR 2
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- Pytorch
|
5 |
- Geospatial
|
6 |
- Temporal ViT
|
7 |
+
- Vit
|
8 |
---
|
9 |
|
10 |
+
### Model and Inputs
|
11 |
+
Prithvi is a first-of-its-kind temporal Vision transformer pretrained by the IBM and NASA team on continental US Harmonised Landsat Sentinel 2 (HLS) data. Particularly, the model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder learning strategy, with a MSE as a loss function. The model includes spatial attention across multiple patchies and also temporal attention for each patch.
|
12 |
|
13 |
+
![](Prithvi_training.png)
|
14 |
|
|
|
15 |
The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most
|
16 |
other works around remote sensing modeling. Being able to handle a time series of remote sensing images can be very helpful to a variety of downstream tasks. The model can also handle static image which can be simply fed into the model with T=1.
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
### Pre-training
|
19 |
+
The model was pre-trained with NASA's HLS2 L30 product (30m granularity) from Continental United States. The bands that were used are the following:
|
20 |
|
21 |
* Blue
|
22 |
* Green
|
|
|
25 |
* SWIR 1
|
26 |
* SWIR 2
|
27 |
|
28 |
+
### Code
|
29 |
+
The model follows the [original mae repo](https://github.com/facebookresearch/mae) with some modifications including:
|
30 |
+
1. replace 2D patch embed with 3D patch embed;
|
31 |
+
2. replace 2D positional embed with 3D positional embed;
|
32 |
+
3. replace 2D patchify and unpatchify with 3D.
|