34
UI-TARS
🌖
Generate click coordinates from image and instruction
Generate text based on user prompts and settings
Generate talking face animation from still images and audio
Generate animated faces from still images and videos
Generate a talking face video from text