deeptimhe commited on
Commit
209edae
1 Parent(s): dec87b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -15,12 +15,12 @@ A Family of Versatile and State-Of-The-Art Video Tokenizers
15
 
16
  <img src="./assets/radar.png" width="95%" alt="radar" align="center">
17
 
18
- VidTok is a family of versatile video tokenizers that delivers state-of-the-art performance in both continuous and discrete tokenizations with various compression rates. VidTok incorporates several key advancements over existing approaches:
19
- * ⚡️ **Model architecture**. We handle spatial and temporal sampling separately, reducing computational complexity without sacrificing reconstruction quality.
20
- * 🔥 **Advanced quantization techniques**. To address the training instability and codebook collapse commonly associated with conventional Vector Quantization (VQ), we use Finite Scalar Quantization (FSQ) in discrete video tokenization.
21
- * 💥 **Improved training strategies**. To improve training efficiency, we employ a two-stage training strategy: initially pre-training the full model on low-resolution videos, followed by fine-tuning only the decoder on high-resolution videos. Furthermore, we observe that utilizing training data with reduced frame rates effectively improves the model's ability to represent motion dynamics.
22
 
23
- We train VidTok on a large-scale video dataset and evaluation reveal that VidTok outperforms previous models in both discrete and continuous tokenization, achieving superior results across all evaluated metrics, including PSNR, SSIM, LPIPS, and FVD.
24
 
25
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
26
 
 
15
 
16
  <img src="./assets/radar.png" width="95%" alt="radar" align="center">
17
 
18
+ VidTok is a cutting-edge family of video tokenizers that delivers state-of-the-art performance in both continuous and discrete tokenizations with various compression rates. VidTok incorporates several key advancements over existing approaches:
19
+ * ⚡️ **Efficient Architecture**. Separate spatial and temporal sampling reduces computational complexity without sacrificing quality.
20
+ * 🔥 **Advanced Quantization**. Finite Scalar Quantization (FSQ) addresses training instability and codebook collapse in discrete tokenization.
21
+ * 💥 **Enhanced Training**. A two-stage strategypre-training on low-res videos and fine-tuning on high-res—boosts efficiency. Reduced frame rates improve motion dynamics representation.
22
 
23
+ VidTok, trained on a large-scale video dataset, outperforms previous models across all metrics, including PSNR, SSIM, LPIPS, and FVD.
24
 
25
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
26