9 28 37

Sukesh Perla

hitchhiker3010

AI & ML interests

None yet

Recent Activity

liked a model 11 days ago

all-hands/openhands-lm-32b-v0.1

updated a collection 19 days ago

Reasoning MLLM

upvoted a paper 27 days ago

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

View all activity

Organizations

hitchhiker3010's activity

upvoted 4 papers 27 days ago

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published 28 days ago • 27

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published 29 days ago • 33

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published 28 days ago • 25

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published 29 days ago • 43

upvoted a paper about 1 month ago

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

Paper • 2503.03751 • Published Mar 5 • 20

upvoted a paper 2 months ago

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 212

upvoted a collection 3 months ago

Gradio WebRTC Cookbook ⚡️

Collection

Collection of real-time voice and video demos built with gradio-webrtc custom component • 8 items • Updated Dec 10, 2024 • 17

upvoted 2 papers 4 months ago

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages

Paper • 2412.09025 • Published Dec 12, 2024 • 4

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

Paper • 2412.08687 • Published Dec 11, 2024 • 13

upvoted a collection 5 months ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 14 days ago • 221

upvoted 4 papers 5 months ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 124

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 129

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 94

upvoted 4 papers 6 months ago

upvoted a paper 7 months ago

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 115