10 19 4

fulong ye

Alon77777

https://scholar.google.com.hk/citations?hl=zh-CN&user=-BbQ5VgAAAAJ

superhero-7

AI & ML interests

vision and language, diffusion model, text-to-image generation, image-to-text generation, referring expression generation and comprehension

Recent Activity

authored a paper 17 days ago

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

View all activity

Organizations

Alon77777's activity

upvoted a paper 4 months ago

Aquila2 Technical Report

Paper • 2408.07410 • Published Aug 14 • 13

upvoted a paper 5 months ago

IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17 • 12

upvoted 2 papers 6 months ago

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5 • 52

upvoted 3 papers 7 months ago

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 72

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14 • 12

upvoted 2 papers 9 months ago

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8 • 22

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

Paper • 2403.08268 • Published Mar 13 • 15

upvoted 4 papers 11 months ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25

Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support

Paper • 2401.14688 • Published Jan 26 • 13

Repositioning the Subject within Image

Paper • 2401.16861 • Published Jan 30 • 13

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 45

upvoted 2 papers about 1 year ago

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 45

De-Diffusion Makes Text a Strong Cross-Modal Interface

Paper • 2311.00618 • Published Nov 1, 2023 • 21

upvoted 4 papers over 1 year ago

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

Paper • 2308.08089 • Published Aug 16, 2023 • 21

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64

Generative Pretraining in Multimodality

Paper • 2307.05222 • Published Jul 11, 2023 • 21

JourneyDB: A Benchmark for Generative Image Understanding

Paper • 2307.00716 • Published Jul 3, 2023 • 19