NG

SirRa1zel

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

upvoted a collection about 8 hours ago

Cosmos

liked a Space 18 days ago

data-agents/jupyter-agent

upvoted a paper 26 days ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

View all activity

Organizations

None yet

SirRa1zel's activity

upvoted a collection about 8 hours ago

Cosmos

Collection

The collection of Cosmos models • 30 items • Updated about 14 hours ago • 101

upvoted a paper 26 days ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 28 days ago • 46

upvoted a collection 28 days ago

[MASK] is All You Need

Collection

Code, dataset, and pretrained model • 5 items • Updated Nov 29, 2024 • 8

upvoted a paper about 1 month ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 34

upvoted a paper about 2 months ago

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Paper • 2407.03648 • Published Jul 4, 2024 • 17

upvoted a collection about 2 months ago

MelodyFlow

Collection

MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching • 7 items • Updated Oct 23, 2024 • 16

upvoted a collection 2 months ago

LayerSkip

Collection

Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated Nov 21, 2024 • 46

upvoted 2 papers 3 months ago

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 27

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 40

upvoted 2 papers 4 months ago

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published Sep 12, 2024 • 17

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Paper • 2408.14805 • Published Aug 27, 2024 • 13

upvoted 5 papers 5 months ago

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Paper • 2408.02900 • Published Aug 6, 2024 • 26

upvoted 4 papers 6 months ago

Visual Text Generation in the Wild

Paper • 2407.14138 • Published Jul 19, 2024 • 9

Tx-LLM: A Large Language Model for Therapeutics

Paper • 2406.06316 • Published Jun 10, 2024 • 16

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Paper • 2406.05629 • Published Jun 9, 2024 • 7

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12, 2024 • 18