Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

reacted to their post with 👍 about 9 hours ago
11 Types of JEPA Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input. Here are 11 JEPA types that you should know about: 1. V-JEPA 2 -> https://huggingface.co/papers/2506.09985 Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world 2. Time-Series-JEPA (TS-JEPA) -> https://huggingface.co/papers/2406.04853 It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data 3. Denoising JEPA (D-JEPA) -> https://huggingface.co/papers/2410.03755 Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses 4. CNN-JEPA -> https://huggingface.co/papers/2408.07514 This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy 5. Stem-JEPA -> https://huggingface.co/papers/2408.02514 Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation 6. DMT-JEPA (Discriminative Masked Targets JEPA) -> https://huggingface.co/papers/2405.17995 Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation Read further below👇 Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe
replied to their post 1 day ago
11 Types of JEPA Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input. Here are 11 JEPA types that you should know about: 1. V-JEPA 2 -> https://huggingface.co/papers/2506.09985 Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world 2. Time-Series-JEPA (TS-JEPA) -> https://huggingface.co/papers/2406.04853 It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data 3. Denoising JEPA (D-JEPA) -> https://huggingface.co/papers/2410.03755 Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses 4. CNN-JEPA -> https://huggingface.co/papers/2408.07514 This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy 5. Stem-JEPA -> https://huggingface.co/papers/2408.02514 Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation 6. DMT-JEPA (Discriminative Masked Targets JEPA) -> https://huggingface.co/papers/2405.17995 Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation Read further below👇 Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe
posted an update 1 day ago
11 Types of JEPA Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input. Here are 11 JEPA types that you should know about: 1. V-JEPA 2 -> https://huggingface.co/papers/2506.09985 Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world 2. Time-Series-JEPA (TS-JEPA) -> https://huggingface.co/papers/2406.04853 It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data 3. Denoising JEPA (D-JEPA) -> https://huggingface.co/papers/2410.03755 Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses 4. CNN-JEPA -> https://huggingface.co/papers/2408.07514 This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy 5. Stem-JEPA -> https://huggingface.co/papers/2408.02514 Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation 6. DMT-JEPA (Discriminative Masked Targets JEPA) -> https://huggingface.co/papers/2405.17995 Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation Read further below👇 Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Posts 27

view post
Post
1804
11 Types of JEPA

Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.

Here are 11 JEPA types that you should know about:

1. V-JEPA 2 -> V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (2506.09985)
Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world

2. Time-Series-JEPA (TS-JEPA) -> Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks (2406.04853)
It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data

3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755)
Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses

4. CNN-JEPA -> CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture (2408.07514)
This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy

5. Stem-JEPA -> Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation (2408.02514)
Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation

6. DMT-JEPA (Discriminative Masked Targets JEPA) -> DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture (2405.17995)
Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation

Read further below👇

Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe

Articles 41

Article
13

🦸🏻#17: What is A2A and why is it – still! – underappreciated?

models 0

None public yet

datasets 0

None public yet