Submitted by Bai LiChen 9 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model catnip 29 1