Evaluate agent by answering questions and submitting results
Search web, get time, and generate images