Stoney Kang

sikang99

AI & ML interests

Remote Control based on Vision

Recent Activity

upvoted a paper about 23 hours ago

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

upvoted a paper 2 days ago

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

upvoted a paper 4 days ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

View all activity

Organizations

sikang99's activity

upvoted a paper about 23 hours ago

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Paper • 2504.13157 • Published 5 days ago • 16

upvoted a paper 2 days ago

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

Paper • 2504.09228 • Published 10 days ago • 4

upvoted 4 papers 4 days ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published 12 days ago • 26

BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

Paper • 2504.09048 • Published 10 days ago • 7

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published 6 days ago • 61

Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published 6 days ago • 10

upvoted a paper 5 days ago

Efficient Reasoning Models: A Survey

Paper • 2504.10903 • Published 7 days ago • 18

upvoted 2 papers 6 days ago

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published 8 days ago • 14

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 8 days ago • 237

upvoted 2 papers 7 days ago

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published 15 days ago • 118

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 12 days ago • 30

upvoted a paper 8 days ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 11 days ago • 120

upvoted 5 papers 10 days ago

upvoted 2 papers 14 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 15 days ago • 167

Scene-Centric Unsupervised Panoptic Segmentation

Paper • 2504.01955 • Published 20 days ago • 5

upvoted a paper 19 days ago

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Paper • 2504.00557 • Published 21 days ago • 15