README.md · jonathanzkoch/vjepa-self-driving at 41c6668b1d2755d91ab233f3aa476fc2f4754a57

VJEPA Encoder

The VJEPA Encoder finetuned JEPA model trained on High Speed and High Dynamic Range Video with an Event Camera IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. This package is an adaptation to facebookresearch/jepa to enable ease of use of the Jepa Architecture built with Vision Transformers.

Installation

To install the VJEPA Encoder package, you can use pip:

pip install vjepa_encoder

Usage

To use the VJEPA Encoder in your Python code, you can import it as follows:

from vjepa_encoder.vision_encoder import JepaEncoder

Loading the Encoder

To load the pre-trained encoder, you can use the load_model function:

encoder = JepaEncoder.load_model(config_file_path, devices)

config_file_path: Path to the configuration file (YAML) containing the model settings.
devices: List of devices (e.g., ['cuda:0']) to use for distributed training. If not provided, the model will be loaded on the CPU.

Preprocessing Data

The VJEPA Encoder provides a preprocess_data function to preprocess input data before feeding it to the encoder:

preprocessed_data = encoder.preprocess_data(input_data)

input_data: Input data, which can be an image path, image array, PIL Image, or PyTorch tensor.

Embedding Images

To obtain the embeddings for an image, you can use the embed_image function:

embeddings = encoder.embed_image(input_data)

input_data: Input data, which can be an image path, image array, PIL Image, or PyTorch tensor.

The function returns the embeddings generated by the encoder.

Configuration

The VJEPA Encoder requires a configuration file in YAML format to specify the model settings. The configuration file should include the following sections:

meta: General settings such as the checkpoint file path, random seed, etc.
mask: Settings related to masking.
model: Model architecture settings.
data: Data-related settings such as crop size, patch size, etc.
logging: Logging settings.

Please refer to the provided configuration file template for more details.

License

The VJEPA Encoder is released under the MIT License.

Acknowledgments

The VJEPA Encoder is based on the research work conducted by Facebook AI Research. We would like to acknowledge their contributions to the field of computer vision and representation learning.

Contact

If you have any questions or suggestions regarding the VJEPA Encoder, please feel free to contact me at johnnykoch02@gmail.com.