[FEEDBACK] Daily Papers
Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.
How to submit a paper to the Daily Papers, like @akhaliq (AK)?
- Submitting is available to paper authors
- Only recent papers (less than 7d) can be featured on the Daily
Then drop the arxiv id in the form at https://huggingface.co/papers/submit
- Add medias to the paper (images, videos) when relevant
- You can start the discussion to engage with the community
Please check out the documentation
We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".
Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset
@Yiwen-ntu for now we support only videos as paper covers in the Daily.
we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644
M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation,https://arxiv.org/pdf/2405.16273 , accepted by NeurIPS 2024
I don't have a paper, but I made a small sample framework researchers could use for sampling experiments.
Text4Seg: Reimagining Image Segmentation as Text Generation
Paper: https://arxiv.org/abs/2410.09855
Github: https://github.com/mc-lan/Text4Seg
Depth Any Video with Scalable Synthetic Data
Depth Any Video introduces a scalable synthetic data pipeline, capturing 40,000 video clips from diverse games, and leverages powerful priors of generative video diffusion models to advance video depth estimation. By incorporating rotary position encoding, flow matching, and a mixed-duration training strategy, it robustly handles varying video lengths and frame rates. Additionally, a novel depth interpolation method enables high-resolution depth inference, achieving superior spatial accuracy and temporal consistency over previous models.
Arxiv link: https://arxiv.org/abs/2410.10815
Project page: https://depthanyvideo.github.io
Code: https://github.com/Nightmare-n/DepthAnyVideo
Huggingface gradio demo: https://huggingface.co/spaces/hhyangcs/depth-any-video