Ayaan Sharif

Ayaan-Sharif

AI & ML interests

NLP, LLM, TEXT, Languages

Recent Activity

liked a model 21 days ago
MiniMaxAI/MiniMax-VL-01
liked a dataset 24 days ago
DAMO-NLP-SG/multimodal_textbook
View all activity

Organizations

Hugging Face Discord Community's profile picture

Ayaan-Sharif's activity

replied to sanchit-gandhi's post about 1 month ago
view reply

what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !

reacted to vladbogo's post with 👍 about 2 months ago
view post
Post
Panda-70M is a new large-scale video dataset comprising 70 million high-quality video clips, each paired with textual captions, designed to be used as pre-training for video understanding tasks.

Key Points:
* Automatic Caption Generation: Utilizes an automatic pipeline with multiple cross-modality teacher models to generate captions for video clips.
* Fine-tuned Caption Selection: Employs a fine-tuned retrieval model to select the most appropriate caption from multiple candidates for each video clip.
* Improved Performance: Pre-training on Panda-70M shows significant performance gains in video captioning, text-video retrieval, and text-driven video generation.

Paper: Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (2402.19479)
Project page: https://snap-research.github.io/Panda-70M/
Code: https://github.com/snap-research/Panda-70M

Congrats to the authors @tschen , @aliaksandr-siarohin et al. for their work!
  • 1 reply
·
New activity in tencent/HunyuanVideo 2 months ago

multi gpu setup when ?

2
#5 opened 2 months ago by
Ayaan-Sharif