llamaIndex 的不同 Agent 的区别

发表于 2024-01-08 更新于 2025-02-02 分类于 2-深度学习， LLM开发工程师指南， LlamaIndex 阅读次数：本文字数： 11k 阅读时长 ≈ 10 分钟

LlamaIndex 定义了两种 Agent，即 FunctionCallingAgent、ReActAgent 和 StructuredPlannerAgent，他们之间有什么区别呢？

特性	FunctionCallingAgent	ReActAgent	StructuredPlannerAgent
设计目的	直接调用函数来完成任务。无需复杂的规划和推理过程。	使用结构化思考（ReAct 循环）来分解复杂问题并逐步解决它。	先进行整体规划，然后再执行具体步骤。
LLM 交互方式	通过工具直接与 LLM 交互，不解析中间的思维步骤。	向 LLM 提供明确的提示格式和结构化输出格式（如 Thought、Action 等）。	先创建计划，然后逐步执行任务，调整计划。
使用 prompt 的方式	默认无自定义 prompt，由工具驱动。	明确定义了思考和行动规则的 prompt。	创建和优化整体任务规划的特定提示。
思维过程	简单直接，每个步骤即为一个明确的任务执行。	结构化、分步地处理问题（分解、推理）。	先全局计划，后逐步执行并调整。
灵活性和复杂性管理	较低的灵活性和复杂的任务解决能力。	高度灵活且适合解决复杂任务。	适用于需要先期规划的任务解决方式。
适用场景	简单直接的问题解决或函数执行。	处理涉及多步骤、推理和问题分解的复杂查询。	先进行详细规划，然后逐步实施计划。

这个表格总结了三种不同类型的 Agent 在设计目的上的主要区别，以及它们与 LLM 交互的方式、使用的提示方式、思维过程、灵活性和复杂性管理能力方面的差异。

FunctionCallingAgent 适用于简单的任务执行，其优点在于直接和高效。
ReActAgent 则更适合于需要结构化思考和多步骤推理的场景，能够有效地处理复杂的查询或问题分解任务。
StructuredPlannerAgent 更适合那些在开始时就需要详细规划的任务，并且可以动态调整计划以适应变化的需求。

每种类型的 Agent 都有其适用的特定情境，选择合适的类型有助于更高效地解决问题。

在使用 llamaIndex 时，有 3 种不同类型的 Agent，这 3 种 Agent 的原理及区别

import nest_asyncio
nest_asyncio.apply()
from llama_index.core import agent
indexs=list(filter(lambda att:att.endswith('Agent')>0,dir(agent)))
print(indexs)

[‘FunctionCallingAgent’, ‘ReActAgent’, ‘StructuredPlannerAgent’]

通过查看源代码，他们之间的关系如下：

classDiagram
	class BaseAgent
	class BaseAgentRunner
	BaseAgent <|-- BaseAgentRunner
	class AgentRunner
	class BaseAgentWorker
	class ReActAgent
	class ReActAgentWorker
	BaseAgentRunner<|-- AgentRunner
	AgentRunner<|-- ReActAgent
	ReActAgent<-- ReActAgentWorker
	BaseAgentWorker <|-- ReActAgentWorker
	class StructuredPlannerAgent
	class BasePlanningAgentRunner
	BasePlanningAgentRunner <|-- StructuredPlannerAgent
	StructuredPlannerAgent<-- BaseAgentWorker
	AgentRunner<|-- BasePlanningAgentRunner
	class FunctionCallingAgent
	AgentRunner <|-- FunctionCallingAgent
	class FunctionCallingAgentWorker
	FunctionCallingAgent <-- FunctionCallingAgentWorker
	BaseAgentWorker <| -- FunctionCallingAgentWorker

这里有 3 大类的类 Worker、Runner、Agent，其中 Agent 继承 Runner，Agent 使用 Worker 作为执行步骤

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
base_url='http://localhost:11434'
llm = Ollama(model="qwen2.5:latest", request_timeout=360.0,base_url=base_url)
Settings.llm = llm
Settings.embed_model = OllamaEmbedding(model_name="quentinz/bge-large-zh-v1.5:latest",base_url=base_url)

from llama_index.core.tools import FunctionTool
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b
add_tool = FunctionTool.from_defaults(fn=add)

FunctionCallingAgent 与 ReActAgent

from llama_index.core.agent import FunctionCallingAgent
function_calling_agent=FunctionCallingAgent.from_tools(tools=[multiply_tool, add_tool],verbose=True)
response = function_calling_agent.chat("计算结果，1000+157*2？")
print(response)

> Running step f 1466 f 93-b 140-4 f 58-b 800-b 7221 e 1 fe 5 cd. Step input: 计算结果，1000+157*2？
Added user message to memory: 计算结果，1000+157*2？
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step 4 a 7 cdcf 8-97 b 5-4 e 1 b-9387-62734 c 68 de 34. Step input: None
=== Calling Function ===
Calling function: add with args: {"a": 1000, "b": 314}
=== Function Output ===
1314
> Running step 600 c 57 bf-c 783-49 fc-be 90-f 402 cd 35 e 78 a. Step input: None
=== LLM Response ===
计算结果是 \( 1000 + 157 \times 2 = 1314 \)。
计算结果是 \( 1000 + 157 \times 2 = 1314 \)。

from llama_index.core.agent import ReActAgent
# 创建代理 
reAct_agent = ReActAgent.from_tools([multiply_tool, add_tool], verbose=True)
response = reAct_agent.chat("计算结果，1000+157*2？")
print(response)

> Running step 13973598-2 e 38-4 cc 3-ba 70-0897 d 09 bb 55 c. Step input: 计算结果，1000+157*2？
Thought: The current language of the user is: Chinese. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 157, 'b': 2}
Observation: 314
> Running step eb 577013-9 af 9-4864-8516-ca 97 b 24 d 288 f. Step input: None
Thought: I can now perform the addition using the result from the multiplication.
Action: add
Action Input: {'a': 1000, 'b': 314}
Observation: 1314
> Running step e 75 c 52 b 9-109 a-4 fdc-b 555-6159 e 9 c 3 fb 5 c. Step input: None
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: 计算结果是 1314。
计算结果是 1314。

task = self.create_task(message)
result_output = None
dispatcher.event(AgentChatWithStepStartEvent(user_msg=message))
while True:
    # pass step queue in as argument, assume step executor is stateless
    cur_step_output = self._run_step(
        task.task_id, mode=mode, tool_choice=tool_choice
    )
    if cur_step_output.is_last:
        result_output = cur_step_output
        break
    # ensure tool_choice does not cause endless loops
    tool_choice = "auto"

FunctionCallingAgent 与 ReActAgent 的 chat 均是使用以上代码，看得出来是一个一直输出的过程，除非输出出现结束标志 (cur_step_output. Is_last=True)
_run_step 步骤开始出现不同：

两者都是分为 3 个步骤：(1) 执行一次 llm 推理；(2) 解析 llm 输出；(3) 生成下一步的 Task
在执行 llm 推理时，FunctionCallingAgent 直接调用接口 chat_with_tools，而 ReActAgent 是 chat，也就是 FunctionCallingAgent 提供 tool 给 llm，而 ReActAgent 只是通过 prompt 提供
解析 llm 输出时，通过查看 ReActAgent 的 prompt，可以看出其要求结构化输出，解析是提取输出 “Thought、Action、Action Input、Observation” 的不同输出
根据解析的输出，规划下一步输出

functionCallingVSReActAgent

FunctionCallingAgent 没有自定义 prompt，而是直接提供 tool 供 llm 内部选择，ReActAgent 规范了 llm 的思考规则，通过选择工具回答问题，并规则输出格式

ReActAgent 的 prompt 包含 2 部分：
Tools：明确能使用的工具
Output Format：明确输出的格式，包含 2 个分支

如果是中间步骤，按照以下格式输出：

1
2
3

Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of {tool_names}) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})

如果是最终结果，按照以下格式之一输出

Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: [your answer here (In the same language as the user's question)]
>	Thought: I cannot answer the question with the provided tools.
Answer: [your answer here (In the same language as the user's question)]

可见 ReActAgent 是一个规范思维方式的，自定义程度比 FunctionCallingAgent 高的 Agent

StructuredPlannerAgent

from llama_index.core.agent import StructuredPlannerAgent,FunctionCallingAgentWorker
# 创建代理 
# create the function calling worker for reasoning
worker = FunctionCallingAgentWorker.from_tools(
    [multiply_tool, add_tool], verbose=True
)
# wrap the worker in the top-level planner
agent = StructuredPlannerAgent(
    worker, tools=[multiply_tool, add_tool],
    verbose=True,
    memory=None
)
response = agent.chat("计算结果，1000+157*2？")

=== Initial plan ===
计算乘法部分:
157 * 2 -> 314.0
Deps: []
计算加法部分:
1000 + 314.0 -> 1314.0
Deps: ['计算乘法部分']
> Running step 3 cd 2 cc 1 c-0 f 45-4 fb 1-aad 7-d 28 b 4 c 821 f 2 c. Step input: 157 * 2
Added user message to memory: 157 * 2
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step 95 f 3 f 0 a 6-8 ba 7-4 dbc-9 e 86-f 33118 f 69 ad 5. Step input: None
=== LLM Response ===
The product of 157 and 2 is 314.
=== Refined plan ===
计算加法部分:
1000 + 314.0 -> 1314.0
Deps: ['计算乘法部分']
完成任务:
 -> 1314.0
Deps: ['计算加法部分']
> Running step abad 3 f 05-a 750-4 f 62-85 bd-28 db 75 fc 4 fb 6. Step input: 1000 + 314.0
Added user message to memory: 1000 + 314.0
=== Calling Function ===
Calling function: add with args: {"a": 1000, "b": 314}
=== Function Output ===
1314
> Running step 203 c 38 a 5-29 fd-4 bce-acac-5 c 3530 a 25 ed 3. Step input: None
=== LLM Response ===
The sum of 1000 and 314.0 is 1314.0.
=== Refined plan ===
验证最终结果:
 -> 1314.0
Deps: ['计算加法部分']
> Running step bee 84 f 9 a-8 a 9 a-4555-94 e 9-e 6 ccf 2 deb 1 dc. Step input: 
Added user message to memory: 
=== LLM Response ===
Great! If you have any other calculations or questions, feel free to ask!
=== Refined plan ===
计算乘法部分:
Math.Multiply (157, 2) -> succeeded
Deps: []
计算加法部分:
Math.Add (1000, 314.0) -> succeeded
Deps: ['计算乘法部分']
验证最终结果:
 -> Great! If you have any other calculations or questions, feel free to ask!
Deps: ['计算加法部分']

和前面两个 “边规划边执行不同”，还有一种思维方式，提前规划好，然后执行规划，得到最终答案， StructuredPlannerAgent 就是这种方式，该 Agent 整体运行逻辑如下

PlanningAgentRunner

Create_plan：提出整体计划
执行：执行第一个任务
优化计划：根据上一任务结果及下一任务，调整优化计划

仔细地，还是从 prompt 了解其原理

DEFAULT_INITIAL_PLAN_PROMPT = """\
Think step-by-step. Given a task and a set of tools, create a comprehesive, end-to-end plan to accomplish the task.
Keep in mind not every task needs to be decomposed into multiple sub-tasks if it is simple enough.
The plan should end with a sub-task that satisfies the overall task.
The tools available are:
{tools_str}
Overall Task: {task}
"""

这是 StructuredPlannerAgent 的 create_plan 方法的 prompt，可以看出其作用是提出整体规划

还有一个优化计划的 prompt，作用是根据已执行的任务结果，更新后续任务

DEFAULT_PLAN_REFINE_PROMPT = """\
Think step-by-step. Given an overall task, a set of tools, and completed sub-tasks, update (if needed) the remaining sub-tasks so that the overall task can still be completed.
The plan should end with a sub-task that satisfies the overall task.
If the remaining sub-tasks are sufficient, you can skip this step.
The tools available are:
{tools_str}
Overall Task:
{task}
Completed Sub-Tasks + Outputs:
{completed_outputs}
Remaining Sub-Tasks:
{remaining_sub_tasks}
"""

创建初始任务和计划

以下根据 StructuredPlannerAgent 原理，使用低级 API 展示

plan_id = agent.create_plan("计算结果，1000+157*2？")
print('------------'*3)
plan = agent.state.plan_dict[plan_id]
for sub_task in plan.sub_tasks:
    print(f"===== Sub Task {sub_task.name} =====")
    print("Expected output: ", sub_task.expected_output)
    print("Dependencies: ", sub_task.dependencies)

=== Initial plan ===
Step 1: Multiply 157 by 2:
Multiply (157, 2) -> PENDING
Deps: []
Step 2: Add the result of Step 1 to 1000:
Add (1000, {result from Step 1}) -> PENDING
Deps: ['Step 1']
------------------------------------
===== Sub Task Step 1: Multiply 157 by 2 =====
Expected output:  PENDING
Dependencies:  []
===== Sub Task Step 2: Add the result of Step 1 to 1000 =====
Expected output:  PENDING
Dependencies:  ['Step 1']

执行第一组任务

# 获取下一步要执行的任务
next_tasks = agent.state.get_next_sub_tasks(plan_id)
for sub_task in next_tasks:
    print(f"===== Sub Task {sub_task.name} =====")
    print("Expected output: ", sub_task.expected_output)
    print("Dependencies: ", sub_task.dependencies)
# 执行任务
for sub_task in next_tasks:
    response = agent.run_task(sub_task.name)
    agent.mark_task_complete(plan_id, sub_task.name)

===== Sub Task Step 1: Multiply 157 by 2 =====
Expected output:  PENDING
Dependencies:  []
> Running step 6008 e 40 b-5 e 57-471 f-9 a 0 d-94 c 422 c 5 c 9 be. Step input: multiply (157, 2)
Added user message to memory: multiply (157, 2)
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step dcf 22 d 0 b-c 100-4 a 88-83 ce-a 83 bbd 82 f 485. Step input: None
=== LLM Response ===
The product of 157 and 2 is 314.

查看是否结束

1 2	next_tasks = agent.get_next_tasks(plan_id) print(len(next_tasks))

0

优化任务

# refine the plan
agent.refine_plan(
    "计算结果，1000+157*2？",
    plan_id,
)

=== Refined plan ===
Step 2: Add the result of Step 1 to 1000:
Add (1000, 314) -> PENDING
Deps: [‘Step 1’]