Commit
·
61420aa
1
Parent(s):
9eb104f
Updates to jepas compatability with python
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
VJEPA Encoder
|
2 |
|
3 |
-
The VJEPA Encoder
|
4 |
|
5 |
## Installation
|
6 |
|
@@ -23,12 +23,19 @@ from vjepa_encoder.vision_encoder import JepaEncoder
|
|
23 |
To load the pre-trained encoder, you can use the `load_model` function:
|
24 |
|
25 |
```python
|
|
|
|
|
26 |
encoder = JepaEncoder.load_model(config_file_path, devices)
|
27 |
```
|
28 |
|
29 |
- `config_file_path`: Path to the configuration file (YAML) containing the model settings.
|
30 |
- `devices`: List of devices (e.g., `['cuda:0']`) to use for distributed training. If not provided, the model will be loaded on the CPU.
|
31 |
|
|
|
|
|
|
|
|
|
|
|
32 |
### Preprocessing Data
|
33 |
|
34 |
The VJEPA Encoder provides a `preprocess_data` function to preprocess input data before feeding it to the encoder:
|
@@ -73,39 +80,6 @@ The VJEPA Encoder is based on the research work conducted by Facebook AI Researc
|
|
73 |
|
74 |
## Contact
|
75 |
|
76 |
-
If you have any questions or suggestions regarding the VJEPA Encoder, please feel free to contact
|
77 |
-
|
78 |
-
|
79 |
-
## Cite
|
80 |
-
|
81 |
-
If you end up using this work in an academic setting ensure you cite the following:
|
82 |
-
|
83 |
-
```
|
84 |
-
@Article{Rebecq19pami,
|
85 |
-
author = {Henri Rebecq and Ren{\'{e}} Ranftl and Vladlen Koltun and Davide Scaramuzza},
|
86 |
-
title = {High Speed and High Dynamic Range Video with an Event Camera},
|
87 |
-
journal = {{IEEE} Trans. Pattern Anal. Mach. Intell. (T-PAMI)},
|
88 |
-
url = {http://rpg.ifi.uzh.ch/docs/TPAMI19_Rebecq.pdf},
|
89 |
-
year = 2019
|
90 |
-
}
|
91 |
-
```
|
92 |
-
|
93 |
-
```
|
94 |
-
@Article{Rebecq19cvpr,
|
95 |
-
author = {Henri Rebecq and Ren{\'{e}} Ranftl and Vladlen Koltun and Davide Scaramuzza},
|
96 |
-
title = {Events-to-Video: Bringing Modern Computer Vision to Event Cameras},
|
97 |
-
journal = {{IEEE} Conf. Comput. Vis. Pattern Recog. (CVPR)},
|
98 |
-
year = 2019
|
99 |
-
}
|
100 |
-
```
|
101 |
-
|
102 |
-
```
|
103 |
-
@article{bardes2024revisiting,
|
104 |
-
title={Revisiting Feature Prediction for Learning Visual Representations from Video},
|
105 |
-
author={Bardes, Adrien and Garrido, Quentin and Ponce, Jean and Rabbat, Michael, and LeCun, Yann and Assran, Mahmoud and Ballas, Nicolas},
|
106 |
-
journal={arXiv preprint},
|
107 |
-
year={2024}
|
108 |
-
}
|
109 |
-
```
|
110 |
|
111 |
---
|
|
|
1 |
VJEPA Encoder
|
2 |
|
3 |
+
The VJEPA Encoder is a Python package that provides an implementation of the encoder component from the JEPA (Joint Encoding for Prediction and Alignment) architecture proposed by Facebook AI Research. The encoder is designed to extract meaningful representations from visual data. I do not own the rights or lay claim to the copyright of this software. This package is an adaptation to `facebookresearch/jepa` to enable ease of use of the Jepa Architecture built with Vision Transformers.
|
4 |
|
5 |
## Installation
|
6 |
|
|
|
23 |
To load the pre-trained encoder, you can use the `load_model` function:
|
24 |
|
25 |
```python
|
26 |
+
config_file_path = "./params-encoder.yaml"
|
27 |
+
devices = ["cuda:0"]
|
28 |
encoder = JepaEncoder.load_model(config_file_path, devices)
|
29 |
```
|
30 |
|
31 |
- `config_file_path`: Path to the configuration file (YAML) containing the model settings.
|
32 |
- `devices`: List of devices (e.g., `['cuda:0']`) to use for distributed training. If not provided, the model will be loaded on the CPU.
|
33 |
|
34 |
+
|
35 |
+
#### Important Notes about the Config File:
|
36 |
+
|
37 |
+
- the config file provided in this repo provides the basics for loading and using the encoder. The most important things to note in this file are the `r_checkpoint`: points at the `.tar` file for the JEPA checkpoint, and the `tabulet_size`: this is used in some temporal calculation and if you plan on embedding images you should set this to `1`; set this to `N` if you plan on using a temporal dimension for your data, where N corresponds to however many temporal inputs you have.
|
38 |
+
|
39 |
### Preprocessing Data
|
40 |
|
41 |
The VJEPA Encoder provides a `preprocess_data` function to preprocess input data before feeding it to the encoder:
|
|
|
80 |
|
81 |
## Contact
|
82 |
|
83 |
+
If you have any questions or suggestions regarding the VJEPA Encoder, please feel free to contact us at johnnykoch02@gmail.com.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
|
85 |
---
|