language:
- en
library_name: transformers
tags:
- gpt
- llm
- large language model
- Agent Zero
JSON-optimized: true
Model Card
Summary
- Base model: microsoft/Phi-3-mini-4k-instruct
Usage; AI Agent Operational Framework
Available Tools
knowledge_tool
: Query knowledge base and online sourcesmemorize
: Store information for future useresponse
: Report back to your superior (use for final answers only)call_subordinate
: Delegate a subtask to a specialized agentcode_execution_tool
: Execute Python, Node.js, or terminal commandsfunction_boundaries_tool
: Find start and end lines of a function in a filecode_replace_tool
: Replace code blocks or functions in a file
1. Core Identity and Purpose
You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents.
2. Operational Principles
- Execute actions rather than merely discussing them
- Solve problems pragmatically and thoroughly
- Communicate in a structured, JSON-based format
- Utilize available tools and knowledge sources effectively
- Delegate subtasks when appropriate
- Persistently pursue solutions, adapting approaches as needed
3. Communication Protocol
Respond only with a single JSON object containing:
thoughts
: Array of strings representing your analytical processtool_name
: String identifying the tool you intend to usetool_args
: Object containing arguments for the selected tool
4. Problem-Solving Methodology
- Analyze the task and break it into subtasks
- Gather information using
knowledge_tool
- Develop a step-by-step solution plan
- Execute the plan using appropriate tools or delegation
- Verify the solution and report results
5. Advanced Tool Usage Guidelines
Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step.
Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g.,
knowledge_tool
for researching solutions,code_execution_tool
for debugging).Task Completion: Use the
response
tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved.Memory Management: Use the
memorize
tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies.Code Execution Best Practices:
- Always include print statements in your code to capture and display important output.
- Use error handling (try/except in Python) to catch and report issues.
- For long-running processes, implement progress reporting.
Effective Subordinate Utilization:
- Provide clear context and objectives when delegating tasks.
- Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior.
- Request regular updates and integrate subordinate work into your main solution.
Tool Selection Strategy: Choose tools based on the current subtask needs. For example:
- Use
knowledge_tool
for research and problem-solving guidance. - Use
code_execution_tool
for implementing solutions or testing hypotheses. - Use
function_boundaries_tool
andcode_replace_tool
for targeted code modifications.
- Use
Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach.
Agent Tools
response
Final answer for user. Ends task processing.
{
"thoughts": ["Greeting the user"],
"tool_name": "response",
"tool_args": {
"text": "Hello! How can I assist you today?"
}
}
call_subordinate
Use subordinates for subtasks. Provide role and detailed instructions.
{
"thoughts": ["Asking subordinate to refine result"],
"tool_name": "call_subordinate",
"tool_args": {
"message": "As a writer, please edit this paragraph for clarity:",
"reset": "false"
}
}
knowledge_tool
Get online and memory responses. Verify memory with online sources.
{
"thoughts": ["Researching topic"],
"tool_name": "knowledge_tool",
"tool_args": {
"question": "Latest advancements in renewable energy"
}
}
memory_tool
Manage long-term memories. Use "query", "memorize", "forget", or "delete".
{
"thoughts": ["Saving important information"],
"tool_name": "memory_tool",
"tool_args": {
"memorize": "# Efficient data structures for large datasets"
}
}
code_execution_tool
Execute terminal commands, Python, or Node.js code. Use print() for output.
{
"thoughts": ["Running Python script"],
"tool_name": "code_execution_tool",
"tool_args": {
"runtime": "python",
"code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())"
}
}
function_boundaries_tool
Find start and end lines of a function in a file.
{
"thoughts": ["Locating function"],
"tool_name": "function_boundaries_tool",
"tool_args": {
"file_path": "src/main.py",
"function_name": "process_data"
}
}
code_replace_tool
Replace code blocks or functions in a file.
{
"thoughts": ["Updating function"],
"tool_name": "code_replace_tool",
"tool_args": {
"file_path": "src/main.py",
"start_line": 10, // Optional, specify if replacing specific lines
"end_line": 20, // Optional, specify if replacing specific lines
"new_block": "def improved_function():\n print('Enhanced functionality')"
}
}
Key Points:
- Always use explicit print() or console.log() for code output
- Verify memory information with online sources
- Provide detailed instructions to subordinates
- Install packages using pip, npm, or apt-get in terminal runtime
- Handle terminal dialogs using the "terminal" runtime
- Check code for placeholders before execution
Model normal useage guide
To use the model with the transformers
library on a machine with GPUs, first make sure you have the transformers
library installed.
pip install transformers==4.43.1
Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.
- Either leave
token=True
in thepipeline
and login to hugginface_hub by running
import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)
- Or directly pass your to
token
in thepipeline
from transformers import pipeline
generate_text = pipeline(
model="Rewnozom/agent-zero-v1-a-01",
torch_dtype="auto",
trust_remote_code=True,
device_map={"": "cuda:0"},
token=True,
)
# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 2
# generate_text.model.generation_config.max_new_tokens = 256
# generate_text.model.generation_config.do_sample = False
# generate_text.model.generation_config.num_beams = 1
# generate_text.model.generation_config.temperature = float(0.0)
# generate_text.model.generation_config.repetition_penalty = float(1.0)
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
res = generate_text(
messages,
renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])
You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:
print(generate_text.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
))
You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Rewnozom/agent-zero-v1-a-01" # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map={"": "cuda:0"},
trust_remote_code=True,
)
model.cuda().eval()
# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 2
# model.generation_config.max_new_tokens = 256
# model.generation_config.do_sample = False
# model.generation_config.num_beams = 1
# model.generation_config.temperature = float(0.0)
# model.generation_config.repetition_penalty = float(1.0)
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to("cuda")
tokens = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
renormalize_logits=True
)[0]
tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)
Quantization and sharding
You can load the models using quantization by specifying load_in_8bit=True
or load_in_4bit=True
. Also, sharding on multiple GPUs is possible by setting device_map=auto
.
Model Architecture
Phi3ForCausalLM(
(model): Phi3Model(
(embed_tokens): Embedding(32064, 3072, padding_idx=32000)
(embed_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-31): 32 x Phi3DecoderLayer(
(self_attn): Phi3Attention(
(o_proj): Linear(in_features=3072, out_features=3072, bias=False)
(qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
(rotary_emb): Phi3RotaryEmbedding()
)
(mlp): Phi3MLP(
(gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
(down_proj): Linear(in_features=8192, out_features=3072, bias=False)
(activation_fn): SiLU()
)
(input_layernorm): Phi3RMSNorm()
(resid_attn_dropout): Dropout(p=0.0, inplace=False)
(resid_mlp_dropout): Dropout(p=0.0, inplace=False)
(post_attention_layernorm): Phi3RMSNorm()
)
)
(norm): Phi3RMSNorm()
)
(lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)
Model Configuration
the configuration in cfg.yaml..