Detects whats happening from the video stream and gets a textual interpretation of what the camera is seeing
Scalable and Versatile 3D Generation from images