Stoney Kang

sikang99

AI & ML interests

Remote Control based on Vision

Recent Activity

upvoted a paper about 5 hours ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

upvoted a paper about 5 hours ago

Describe Anything: Detailed Localized Image and Video Captioning

upvoted a paper 3 days ago

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

View all activity

Organizations

sikang99's activity

upvoted 2 papers about 5 hours ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published 1 day ago • 16

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published 1 day ago • 42

upvoted a paper 3 days ago

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Paper • 2504.13157 • Published 6 days ago • 19

upvoted a paper 4 days ago

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

Paper • 2504.09228 • Published 12 days ago • 4

upvoted 4 papers 6 days ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published 14 days ago • 26

BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

Paper • 2504.09048 • Published 12 days ago • 7

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published 7 days ago • 65

Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published 8 days ago • 10

upvoted a paper 7 days ago

Efficient Reasoning Models: A Survey

Paper • 2504.10903 • Published 9 days ago • 18

upvoted 2 papers 8 days ago

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published 9 days ago • 15

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 9 days ago • 239

upvoted 2 papers 9 days ago

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published 17 days ago • 123

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 14 days ago • 30

upvoted a paper 10 days ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 13 days ago • 121

upvoted 5 papers 12 days ago

upvoted a paper 16 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 16 days ago • 170