VJEPA Encoder
The VJEPA Encoder is a Python package that provides an implementation of the encoder component from the JEPA (Joint Encoding for Prediction and Alignment) architecture proposed by Facebook AI Research. The encoder is designed to extract meaningful representations from visual data. I do not own the rights or lay claim to the copyright of this software. This package is an adaptation to facebookresearch/jepa
to enable ease of use of the Jepa Architecture built with Vision Transformers.
Installation
To install the VJEPA Encoder package, you can use pip:
pip install vjepa-encoder
Usage
To use the VJEPA Encoder in your Python code, you can import it as follows:
from vjepa_encoder.vision_encoder import JepaEncoder
Loading the Encoder
To load the pre-trained encoder, you can use the load_model
function:
config_file_path = "./params-encoder.yaml"
devices = ["cuda:0"]
encoder = JepaEncoder.load_model(config_file_path, devices)
config_file_path
: Path to the configuration file (YAML) containing the model settings.devices
: List of devices (e.g.,['cuda:0']
) to use for distributed training. If not provided, the model will be loaded on the CPU.
Important Notes about the Config File:
- the config file provided in this repo provides the basics for loading and using the encoder. The most important things to note in this file are the
r_checkpoint
: points at the.tar
file for the JEPA checkpoint, and thetabulet_size
: this is used in some temporal calculation and if you plan on embedding images you should set this to1
; set this toN
if you plan on using a temporal dimension for your data, where N corresponds to however many temporal inputs you have.
Preprocessing Data
The VJEPA Encoder provides a preprocess_data
function to preprocess input data before feeding it to the encoder:
preprocessed_data = encoder.preprocess_data(input_data)
input_data
: Input data, which can be an image path, image array, PIL Image, or PyTorch tensor.
Embedding Images
To obtain the embeddings for an image, you can use the embed_image
function:
embeddings = encoder.embed_image(input_data)
input_data
: Input data, which can be an image path, image array, PIL Image, or PyTorch tensor.
The function returns the embeddings generated by the encoder.
Configuration
The VJEPA Encoder requires a configuration file in YAML format to specify the model settings. The configuration file should include the following sections:
meta
: General settings such as the checkpoint file path, random seed, etc.mask
: Settings related to masking.model
: Model architecture settings.data
: Data-related settings such as crop size, patch size, etc.logging
: Logging settings.
Please refer to the provided configuration file template for more details.
License
The VJEPA Encoder is released under the MIT License.
Acknowledgments
The VJEPA Encoder is based on the research work conducted by Facebook AI Research. We would like to acknowledge their contributions to the field of computer vision and representation learning.
Contact
If you have any questions or suggestions regarding the VJEPA Encoder, please feel free to contact us at johnnykoch02@gmail.com.