Update README.md
Browse files
README.md
CHANGED
@@ -15,12 +15,12 @@ A Family of Versatile and State-Of-The-Art Video Tokenizers
|
|
15 |
|
16 |
<img src="./assets/radar.png" width="95%" alt="radar" align="center">
|
17 |
|
18 |
-
VidTok is a family of
|
19 |
-
* ⚡️ **
|
20 |
-
* 🔥 **Advanced
|
21 |
-
* 💥 **
|
22 |
|
23 |
-
|
24 |
|
25 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
|
26 |
|
|
|
15 |
|
16 |
<img src="./assets/radar.png" width="95%" alt="radar" align="center">
|
17 |
|
18 |
+
VidTok is a cutting-edge family of video tokenizers that delivers state-of-the-art performance in both continuous and discrete tokenizations with various compression rates. VidTok incorporates several key advancements over existing approaches:
|
19 |
+
* ⚡️ **Efficient Architecture**. Separate spatial and temporal sampling reduces computational complexity without sacrificing quality.
|
20 |
+
* 🔥 **Advanced Quantization**. Finite Scalar Quantization (FSQ) addresses training instability and codebook collapse in discrete tokenization.
|
21 |
+
* 💥 **Enhanced Training**. A two-stage strategy—pre-training on low-res videos and fine-tuning on high-res—boosts efficiency. Reduced frame rates improve motion dynamics representation.
|
22 |
|
23 |
+
VidTok, trained on a large-scale video dataset, outperforms previous models across all metrics, including PSNR, SSIM, LPIPS, and FVD.
|
24 |
|
25 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
|
26 |
|