LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22 • 25
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 28
VampNet: Music Generation via Masked Acoustic Token Modeling Paper • 2307.04686 • Published Jul 10, 2023 • 20