Transformers Agents
Transformers Agentsã¯ããã€ã§ãå€æŽãããå¯èœæ§ã®ããå®éšçãªAPIã§ãããšãŒãžã§ã³ããè¿ãçµæã¯ãAPIãŸãã¯åºç€ãšãªãã¢ãã«ãå€æŽãããå¯èœæ§ããããããç°ãªãããšããããŸãã
TransformersããŒãžã§ã³v4.29.0ã¯ãããŒã«ãšãšãŒãžã§ã³ãã®ã³ã³ã»ãããåºã«æ§ç¯ãããŠããŸãããã®colabã§è©Šãããšãã§ããŸãã
èŠããã«ãããã¯transformersã®äžã«èªç¶èšèªAPIãæäŸãããã®ã§ãïŒç§ãã¡ã¯äžé£ã®å³éžãããããŒã«ãå®çŸ©ããèªç¶èšèªã解éãããããã®ããŒã«ã䜿çšãããšãŒãžã§ã³ããèšèšããŸããããã¯èšèšäžæ¡åŒµå¯èœã§ããç§ãã¡ã¯ããã€ãã®é¢é£ããããŒã«ãå³éžããŸããããã³ãã¥ããã£ã«ãã£ãŠéçºãããä»»æã®ããŒã«ã䜿çšããããã«ã·ã¹ãã ãç°¡åã«æ¡åŒµã§ããæ¹æ³ã瀺ããŸãã
ãã®æ°ããAPIã§äœãã§ãããã®ããã€ãã®äŸããå§ããŸããããç¹ã«å€ã¢ãŒãã«ãªã¿ã¹ã¯ã«é¢ããŠåŒ·åã§ãã®ã§ãç»åãçæãããããã¹ããèªã¿äžãããããã®ã«æé©ã§ãã
äžèšã®ããã¹ãã®äžã«ãæ¥æ¬èªã®ç¿»èš³ãæäŸããŸãã
agent.run("Caption the following image", image=image)
Input | Output |
---|---|
A beaver is swimming in the water |
agent.run("Read the following text out loud", text=text)
Input | Output |
---|---|
A beaver is swimming in the water |
agent.run(
"In the following `document`, where will the TRRF Scientific Advisory Council Meeting take place?",
document=document,
)
Input | Output |
---|---|
ballroom foyer |
Quickstart
agent.run
ã䜿çšããåã«ããšãŒãžã§ã³ããã€ã³ã¹ã¿ã³ã¹åããå¿
èŠããããŸãããšãŒãžã§ã³ãã¯ã倧èŠæš¡ãªèšèªã¢ãã«ïŒLLMïŒã§ãã
OpenAIã¢ãã«ãšBigCodeãOpenAssistantããã®ãªãŒãã³ãœãŒã¹ã®ä»£æ¿ã¢ãã«ããµããŒãããŠããŸããOpenAIã¢ãã«ã¯ããã©ãŒãã³ã¹ãåªããŠããŸãããOpenAIã®APIããŒãå¿
èŠã§ãããç¡æã§äœ¿çšããããšã¯ã§ããŸãããäžæ¹ãHugging Faceã¯BigCodeãšOpenAssistantã¢ãã«ã®ãšã³ããã€ã³ããžã®ç¡æã¢ã¯ã»ã¹ãæäŸããŠããŸãã
ãŸããããã©ã«ãã®äŸåé¢ä¿ããã¹ãŠã€ã³ã¹ããŒã«ããããã«agents
ã®ãšã¯ã¹ãã©ãã€ã³ã¹ããŒã«ããŠãã ããã
pip install transformers[agents]
OpenAIã¢ãã«ã䜿çšããã«ã¯ãopenai
ã®äŸåé¢ä¿ãã€ã³ã¹ããŒã«ããåŸãOpenAiAgent
ãã€ã³ã¹ã¿ã³ã¹åããŸãã
pip install openai
from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key="<your_api_key>")
BigCodeãŸãã¯OpenAssistantã䜿çšããã«ã¯ããŸããã°ã€ã³ããŠInference APIã«ã¢ã¯ã»ã¹ããŠãã ããã
from huggingface_hub import login
login("<YOUR_TOKEN>")
次ã«ããšãŒãžã§ã³ããã€ã³ã¹ã¿ã³ã¹åããŠãã ããã
from transformers import HfAgent
# Starcoder
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
# StarcoderBase
# agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoderbase")
# OpenAssistant
# agent = HfAgent(url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")
ããã¯ãHugging FaceãçŸåšç¡æã§æäŸããŠããæšè«APIã䜿çšããŠããŸãããã®ã¢ãã«ïŒãŸãã¯å¥ã®ã¢ãã«ïŒã®ç¬èªã®æšè«ãšã³ããã€ã³ãããæã¡ã®å Žåã¯ãäžèšã®URLãšã³ããã€ã³ãããèªåã®URLãšã³ããã€ã³ãã§çœ®ãæããããšãã§ããŸãã
StarCoderãšOpenAssistantã¯ç¡æã§å©çšã§ããã·ã³ãã«ãªã¿ã¹ã¯ã«ã¯éåžžã«åªããæ§èœãçºæ®ããŸãããã ããããè€éãªããã³ãããåŠçããéã«ã¯ããã§ãã¯ãã€ã³ããååã§ãªãããšããããŸãããã®ãããªå Žåã«ã¯ãçŸæç¹ã§ã¯ãªãŒãã³ãœãŒã¹ã§ã¯ãªããã®ã®ãããã©ãŒãã³ã¹ãåäžããå¯èœæ§ã®ããOpenAIã¢ãã«ãè©ŠããŠã¿ãããšããå§ãããŸãã
ããã§æºåãæŽããŸããïŒãããããããªããå©çšã§ãã2ã€ã®APIã«ã€ããŠè©³ãã説æããŸãã
Single execution (run)
åäžå®è¡ã¡ãœããã¯ããšãŒãžã§ã³ãã® run() ã¡ãœããã䜿çšããå Žåã§ãã
agent.run("Draw me a picture of rivers and lakes.")
ããã¯ãå®è¡ãããã¿ã¹ã¯ã«é©ããããŒã«ïŒãŸãã¯ããŒã«ïŒãèªåçã«éžæããé©åã«å®è¡ããŸãã1ã€ãŸãã¯è€æ°ã®ã¿ã¹ã¯ãåãåœä»€ã§å®è¡ããããšãã§ããŸãïŒãã ããåœä»€ãè€éã§ããã»ã©ããšãŒãžã§ã³ãã倱æããå¯èœæ§ãé«ããªããŸãïŒã
agent.run("Draw me a picture of the sea then transform the picture to add an island")
run() æäœã¯ç¬ç«ããŠå®è¡ã§ããŸãã®ã§ãç°ãªãã¿ã¹ã¯ã§äœåºŠãå®è¡ããããšãã§ããŸãã
泚æç¹ãšããŠãããªãã® agent
ã¯åãªã倧èŠæš¡ãªèšèªã¢ãã«ã§ãããããããã³ããã®ããããªå€æŽã§ãå®å
šã«ç°ãªãçµæãåŸãããå¯èœæ§ããããŸãããããã£ãŠãå®è¡ãããã¿ã¹ã¯ãã§ããã ãæ確ã«èª¬æããããšãéèŠã§ããè¯ãããã³ããã®æžãæ¹ã«ã€ããŠã¯ããã¡ã ã§è©³ãã説æããŠããŸãã
å®è¡ããšã«ç¶æ ãä¿æããããããã¹ã以å€ã®ãªããžã§ã¯ãããšãŒãžã§ã³ãã«æž¡ãããããå Žåã¯ããšãŒãžã§ã³ãã䜿çšããå€æ°ãæå®ããããšãã§ããŸããäŸãã°ãæåã®å·ãæ¹ã®ç»åãçæãããã®ç»åã«å³¶ãè¿œå ããããã«ã¢ãã«ã«æ瀺ããã«ã¯ã次ã®ããã«è¡ãããšãã§ããŸãïŒ
picture = agent.run("Generate a picture of rivers and lakes.")
updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)
ããã¯ãã¢ãã«ãããªãã®ãªã¯ãšã¹ããç解ã§ããªãå ŽåããããŒã«ãæ··åããå Žåã«åœ¹ç«ã€ããšããããŸããäŸãã°ïŒ
agent.run("Draw me the picture of a capybara swimming in the sea")
ããã§ã¯ãã¢ãã«ã¯2ã€ã®æ¹æ³ã§è§£éã§ããŸãïŒ
text-to-image
ã«æµ·ã§æ³³ãã«ããã©ãçæããã- ãŸãã¯ã
text-to-image
ã§ã«ããã©ãçæãããããæµ·ã§æ³³ãããããã«image-transformation
ããŒã«ã䜿çšãã
æåã®ã·ããªãªã匷å¶ãããå Žåã¯ãããã³ãããåŒæ°ãšããŠæž¡ãããšãã§ããŸãïŒ
agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in the sea")
Chat-based execution (ãã£ãã)
ãšãŒãžã§ã³ãã¯ãchat() ã¡ãœããã䜿çšããããšã§ããã£ããããŒã¹ã®ã¢ãããŒããå¯èœã§ãã
agent.chat("Transform the picture so that there is a rock in there")
ããã¯ãæ瀺ããŸããã§ç¶æ ãä¿æãããå Žåã«äŸ¿å©ãªã¢ãããŒãã§ãåäžã®æ瀺ã«æ¯ã¹ãŠè€éãªæ瀺ãåŠçããã®ã¯é£ãããããããŸããïŒãã®å Žå㯠run() ã¡ãœããã®æ¹ãé©ããŠããŸãïŒã
ãã®ã¡ãœããã¯ãéããã¹ãåã®åŒæ°ãç¹å®ã®ããã³ãããæž¡ãããå Žåã«ã䜿çšã§ããŸãã
â ïž Remote execution
ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã®ç®çããã¹ãŠã®ã»ããã¢ããã§äœ¿çšã§ããããã«ããªãªãŒã¹ã®ããã«ããã€ãã®ããã©ã«ãããŒã«çšã®ãªã¢ãŒãå®è¡ããŒã«ãäœæããŸãããããã㯠æšè«ãšã³ããã€ã³ã ã䜿çšããŠäœæãããŸãã
ãããã¯çŸåšãªãã«ãªã£ãŠããŸããããªã¢ãŒãå®è¡ããŒã«ãèªåã§èšå®ããæ¹æ³ã«ã€ããŠã¯ãã«ã¹ã¿ã ããŒã«ã¬ã€ã ãèªãããšããå§ãããŸãã
Whatâs happening here? What are tools, and what are agents?
Agents
ããã§ã®ããšãŒãžã§ã³ãããšã¯ã倧èŠæš¡ãªèšèªã¢ãã«ã®ããšã§ãããç¹å®ã®äžé£ã®ããŒã«ã«ã¢ã¯ã»ã¹ã§ããããã«ããã³ãããèšå®ããŠããŸãã
LLMïŒå€§èŠæš¡èšèªã¢ãã«ïŒã¯ãã³ãŒãã®å°ããªãµã³ãã«ãçæããã®ã«ããªãåªããŠããããã®APIã¯ããšãŒãžã§ã³ãã«ç¹å®ã®ããŒã«ã»ããã䜿çšããŠã¿ã¹ã¯ãå®è¡ããã³ãŒãã®å°ããªãµã³ãã«ãçæãããããšã«å©çšããŠããŸãããã®ããã³ããã¯ããšãŒãžã§ã³ãã«ã¿ã¹ã¯ãšããŒã«ã®èª¬æãæäŸããããšã§ããšãŒãžã§ã³ãã䜿çšããŠããããŒã«ã®ããã¥ã¡ã³ãã«ã¢ã¯ã»ã¹ããé¢é£ããã³ãŒããçæã§ããããã«ãªããŸãã
Tools
ããŒã«ã¯éåžžã«åçŽã§ãååãšèª¬æãããªãåäžã®é¢æ°ã§ããããããããããã®ããŒã«ã®èª¬æã䜿çšããŠãšãŒãžã§ã³ããããã³ããããŸããããã³ãããéããŠããšãŒãžã§ã³ãã«ãããŒã«ã䜿çšããŠã¯ãšãªã§èŠæ±ãããã¿ã¹ã¯ãã©ã®ããã«å®è¡ãããã瀺ããŸããç¹ã«ãããŒã«ã®æåŸ ãããå ¥åãšåºåã瀺ããŸãã
ããã¯æ°ããããŒã«ã䜿çšããŠããããã€ãã©ã€ã³ã§ã¯ãªãããŒã«ã䜿çšããŠããŸãããªããªãããšãŒãžã§ã³ãã¯éåžžã«ååçãªããŒã«ã§ããè¯ãã³ãŒããçæããããã§ãããã€ãã©ã€ã³ã¯ãããªãã¡ã¯ã¿ãªã³ã°ããããã°ãã°è€æ°ã®ã¿ã¹ã¯ãçµã¿åãããŠããŸããããŒã«ã¯éåžžã«åçŽãªã¿ã¹ã¯ã«çŠç¹ãåœãŠãããšãæå³ããŠããŸãã
Code-execution?!
ãã®ã³ãŒãã¯ãããŒã«ãšããŒã«ãšäžç·ã«æž¡ãããå ¥åã®ã»ããã§ãåœç€Ÿã®å°èŠæš¡ãªPythonã€ã³ã¿ãŒããªã¿ã§å®è¡ãããŸãããã§ã«æäŸãããããŒã«ãšprinté¢æ°ããåŒã³åºãããšãã§ããªããããå®è¡ã§ããããšã¯ãã§ã«å¶éãããŠããŸããHugging Faceã®ããŒã«ã«å¶éãããŠãããããå®å šã ãšèããŠãåé¡ãããŸããã
ããã«ãå±æ§ã®æ€çŽ¢ãã€ã³ããŒãã¯èš±å¯ããŠãããïŒãããã¯æž¡ãããå ¥å/åºåãåŠçããããã«ã¯å¿ èŠãªãã¯ãã§ãïŒãæãæãããªæ»æã¯åé¡ãããŸããïŒãšãŒãžã§ã³ãã«ããããåºåããããã«ããã³ããããå¿ èŠããããŸãïŒãè¶ å®å šãªåŽã«ç«ã¡ããå Žåã¯ãè¿œå ã®åŒæ° return_code=True ãæå®ã㊠run() ã¡ãœãããå®è¡ã§ããŸãããã®å ŽåããšãŒãžã§ã³ãã¯å®è¡ããã³ãŒããè¿ãã ãã§ãå®è¡ãããã©ããã¯ããªã次第ã§ãã
å®è¡ã¯ãéæ³ãªæäœãè©Šã¿ãè¡ãŸãã¯ãšãŒãžã§ã³ããçæããã³ãŒãã«éåžžã®Pythonãšã©ãŒãããå Žåã«åæ¢ããŸãã
A curated set of tools
ç§ãã¡ã¯ããã®ãããªãšãŒãžã§ã³ãã匷åã§ããããŒã«ã®ã»ãããç¹å®ããŸãã以äžã¯ãtransformers
ã«çµ±åãããããŒã«ã®æŽæ°ããããªã¹ãã§ãïŒ
- ããã¥ã¡ã³ã質åå¿ç: ç»å圢åŒã®ããã¥ã¡ã³ãïŒPDFãªã©ïŒãäžããããå Žåããã®ããã¥ã¡ã³ãã«é¢ãã質åã«åçããŸãïŒDonutïŒ
- ããã¹ã質åå¿ç: é·ãããã¹ããšè³ªåãäžããããå Žåãããã¹ãå ã®è³ªåã«åçããŸãïŒFlan-T5ïŒ
- ç¡æ¡ä»¶ã®ç»åãã£ãã·ã§ã³: ç»åã«ãã£ãã·ã§ã³ãä»ããŸãïŒïŒBLIPïŒ
- ç»å質åå¿ç: ç»åãäžããããå Žåããã®ç»åã«é¢ãã質åã«åçããŸãïŒVILTïŒ
- ç»åã»ã°ã¡ã³ããŒã·ã§ã³: ç»åãšããã³ãããäžããããå Žåããã®ããã³ããã®ã»ã°ã¡ã³ããŒã·ã§ã³ãã¹ã¯ãåºåããŸãïŒCLIPSegïŒ
- é³å£°ããããã¹ããžã®å€æ: 人ã®è©±ã声ã®ãªãŒãã£ãªé²é³ãäžããããå Žåããã®é³å£°ãããã¹ãã«è»¢èšããŸãïŒWhisperïŒ
- ããã¹ãããé³å£°ãžã®å€æ: ããã¹ããé³å£°ã«å€æããŸãïŒSpeechT5ïŒ
- ãŒãã·ã§ããããã¹ãåé¡: ããã¹ããšã©ãã«ã®ãªã¹ããäžããããå Žåãããã¹ããæã察å¿ããã©ãã«ãèå¥ããŸãïŒBARTïŒ
- ããã¹ãèŠçŽ: é·ãããã¹ãã1ã€ãŸãã¯æ°æã«èŠçŽããŸãïŒBARTïŒ
- 翻蚳: ããã¹ããæå®ãããèšèªã«ç¿»èš³ããŸãïŒNLLBïŒ
ãããã®ããŒã«ã¯transformersã«çµ±åãããŠãããæåã§ã䜿çšã§ããŸããããšãã°ã次ã®ããã«äœ¿çšã§ããŸãïŒ
from transformers import load_tool
tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")
Custom tools
ç§ãã¡ã¯ãå³éžãããããŒã«ã®ã»ãããç¹å®ããäžæ¹ããã®å®è£ ãæäŸããäž»èŠãªäŸ¡å€ã¯ãã«ã¹ã¿ã ããŒã«ãè¿ éã«äœæããŠå ±æã§ããèœåã ãšåŒ·ãä¿¡ããŠããŸãã
ããŒã«ã®ã³ãŒããHugging Face SpaceãŸãã¯ã¢ãã«ãªããžããªã«ããã·ã¥ããããšã§ããšãŒãžã§ã³ããšçŽæ¥é£æºããŠããŒã«ã掻çšã§ããŸããhuggingface-tools
organizationã«ã¯ãtransformerséäŸåã®ããã€ãã®ããŒã«ãè¿œå ãããŸããïŒ
- ããã¹ãããŠã³ããŒããŒ: ãŠã§ãURLããããã¹ããããŠã³ããŒãããããã®ããŒã«
- ããã¹ãããç»åãž: ããã³ããã«åŸã£ãŠç»åãçæããããã®ããŒã«ãå®å®ããæ¡æ£ã掻çšããŸã
- ç»åå€æ: åæç»åãšããã³ãããæå®ããŠç»åãå€æŽããããã®ããŒã«ãinstruct pix2pixã®å®å®ããæ¡æ£ã掻çšããŸã
- ããã¹ããããããªãž: ããã³ããã«åŸã£ãŠå°ããªãããªãçæããããã®ããŒã«ãdamo-vilabã掻çšããŸã
æåãã䜿çšããŠããããã¹ãããç»åãžã®ããŒã«ã¯ãhuggingface-tools/text-to-imageã«ãããªã¢ãŒãããŒã«ã§ãïŒä»åŸãããã®çµç¹ããã³ä»ã®çµç¹ã«ããã«ãã®ãããªããŒã«ããªãªãŒã¹ãããã®å®è£ ãããã«åŒ·åããŠãããŸãã
ãšãŒãžã§ã³ãã¯ããã©ã«ãã§huggingface-tools
ã«ããããŒã«ã«ã¢ã¯ã»ã¹ã§ããŸãã
ããŒã«ã®äœæãšå
±ææ¹æ³ããŸãHubã«ååšããã«ã¹ã¿ã ããŒã«ã掻çšããæ¹æ³ã«ã€ããŠã®è©³çŽ°ã¯ã次ã®ã¬ã€ãã§èª¬æããŠããŸãã
Code generation
ãããŸã§ããšãŒãžã§ã³ãã䜿çšããŠããªãã®ããã«ã¢ã¯ã·ã§ã³ãå®è¡ããæ¹æ³ã瀺ããŸããããã ãããšãŒãžã§ã³ãã¯ã³ãŒããçæããã ãã§ãéåžžã«å¶éãããPythonã€ã³ã¿ãŒããªã¿ã䜿çšããŠå®è¡ããŸããçæãããã³ãŒããç°ãªãç°å¢ã§äœ¿çšãããå ŽåããšãŒãžã§ã³ãã«ã³ãŒããè¿ãããã«æ瀺ã§ããŸããããŒã«ã®å®çŸ©ãšæ£ç¢ºãªã€ã³ããŒããå«ããŠã
äŸãã°ã以äžã®åœä»€ïŒ
agent.run("Draw me a picture of rivers and lakes", return_code=True)
次ã®ã³ãŒããè¿ããŸã
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="rivers and lakes")
ãã®åŸãèªåã§å€æŽããŠå®è¡ã§ããŸãã