Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
posted an update 19 days ago
๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’, ๐ญ๐ก๐ž ๐ฒ๐ž๐š๐ซ ๐จ๐Ÿ ๐š๐ ๐ž๐ง๐ญ ๐ฐ๐จ๐ซ๐ค๐Ÿ๐ฅ๐จ๐ฐ๐ฌ ๐Ÿ”ง๐Ÿฆพ๐Ÿค–

I've just watched Andrew Ng's talk at Sequoia last week.
If you're interested in Agents, you should really watch it!

๐—ช๐—ต๐˜† ๐˜‚๐˜€๐—ฒ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐—ณ๐—น๐—ผ๐˜„๐˜€?
The current LLM task solving workflow is not very intuitive:
We ask it โ€œwrite an essay all in one shot, without ever using backspace.โ€

Why not allow the LLM a more similar process to what we would do?
- โ€œWrite an essay outlineโ€
- โ€œDo you need wen research?โ€
- โ€œWrite a first draftโ€
- โ€œConsider improvementsโ€

This is called an Agentic workflow. Existing ones bring a huge performance boost. With HumanEval: GPT-4 zero-shot gets 67% score, agentic with either one of tool use or reflection goes over 90%, and the combination of the two scores even higher!

๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด ๐—ฑ๐—ฒ๐˜€๐—ถ๐—ด๐—ป ๐—ฝ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐—ป๐˜€
On the following two points, the tech is robust:

โš™๏ธ ๐—ฅ๐—ฒ๐—ณ๐—น๐—ฒ๐˜…๐—ถ๐—ผ๐—ป: For instance: add a critic step after the writing step
๐Ÿ› ๏ธ ๐—ง๐—ผ๐—ผ๐—น ๐˜‚๐˜€๐—ฒ: extends the capabilities of the LLM by allowing it to call tools, like search or calculator

The next two will be needed to go further, but the tech for them is more emerging and not reliable yet:
๐Ÿ—บ๏ธ ๐—ฃ๐—น๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด forward to decompose task into subtasks. This allows great behaviours like an AI Agent re-routing after a failure
๐Ÿ ๐— ๐˜‚๐—น๐˜๐—ถ-๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐—ฐ๐—ผ๐—น๐—น๐—ฎ๐—ฏ๐—ผ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Program a flock of agents with tasks.
Improving the two above points will unlock huge performance boosts!

Andrew NG says Research agents are already part of his workflow!

๐—–๐—น๐—ผ๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜๐˜€
Andrew speculates that through agentic workflows, maybe generating many tokens fast from a small LLM will give better results than slower throughput from a powerful LLM like GPT-5.

๐ŸŽฌ Watch the talk here ๐Ÿ‘‰
๐Ÿ“š I've added his recommended reads to m-ric/agents-65ba776fbd9e29f771c07d4e

Did anyone research on frameworks or tools that are currently being used to make agents for production. I've been doing some research but most of them not suitable for production.