llamaIndex 的不同 Agent 的区别

LlamaIndex 定义了两种 Agent,即 FunctionCallingAgent、ReActAgent 和 StructuredPlannerAgent,他们之间有什么区别呢?

特性FunctionCallingAgentReActAgentStructuredPlannerAgent
设计目的直接调用函数来完成任务。无需复杂的规划和推理过程。使用结构化思考(ReAct 循环)来分解复杂问题并逐步解决它。先进行整体规划,然后再执行具体步骤。
LLM 交互方式通过工具直接与 LLM 交互,不解析中间的思维步骤。向 LLM 提供明确的提示格式和结构化输出格式(如 Thought、Action 等)。先创建计划,然后逐步执行任务,调整计划。
使用 prompt 的方式默认无自定义 prompt,由工具驱动。明确定义了思考和行动规则的 prompt。创建和优化整体任务规划的特定提示。
思维过程简单直接,每个步骤即为一个明确的任务执行。结构化、分步地处理问题(分解、推理)。先全局计划,后逐步执行并调整。
灵活性和复杂性管理较低的灵活性和复杂的任务解决能力。高度灵活且适合解决复杂任务。适用于需要先期规划的任务解决方式。
适用场景简单直接的问题解决或函数执行。处理涉及多步骤、推理和问题分解的复杂查询。先进行详细规划,然后逐步实施计划。

这个表格总结了三种不同类型的 Agent 在设计目的上的主要区别,以及它们与 LLM 交互的方式、使用的提示方式、思维过程、灵活性和复杂性管理能力方面的差异。

  • FunctionCallingAgent 适用于简单的任务执行,其优点在于直接和高效。
  • ReActAgent 则更适合于需要结构化思考和多步骤推理的场景,能够有效地处理复杂的查询或问题分解任务。
  • StructuredPlannerAgent 更适合那些在开始时就需要详细规划的任务,并且可以动态调整计划以适应变化的需求。

每种类型的 Agent 都有其适用的特定情境,选择合适的类型有助于更高效地解决问题。


在使用 llamaIndex 时,有 3 种不同类型的 Agent,这 3 种 Agent 的原理及区别

1
2
3
4
5
import nest_asyncio
nest_asyncio.apply()
from llama_index.core import agent
indexs=list(filter(lambda att:att.endswith('Agent')>0,dir(agent)))
print(indexs)

[‘FunctionCallingAgent’, ‘ReActAgent’, ‘StructuredPlannerAgent’]

通过查看源代码,他们之间的关系如下:

classDiagram
	class BaseAgent
	class BaseAgentRunner
	BaseAgent <|-- BaseAgentRunner
	class AgentRunner
	class BaseAgentWorker
	class ReActAgent
	class ReActAgentWorker
	BaseAgentRunner<|-- AgentRunner
	AgentRunner<|-- ReActAgent
	ReActAgent<-- ReActAgentWorker
	BaseAgentWorker <|-- ReActAgentWorker
	class StructuredPlannerAgent
	class BasePlanningAgentRunner
	BasePlanningAgentRunner <|-- StructuredPlannerAgent
	StructuredPlannerAgent<-- BaseAgentWorker
	AgentRunner<|-- BasePlanningAgentRunner
	class FunctionCallingAgent
	AgentRunner <|-- FunctionCallingAgent
	class FunctionCallingAgentWorker
	FunctionCallingAgent <-- FunctionCallingAgentWorker
	BaseAgentWorker <| -- FunctionCallingAgentWorker

这里有 3 大类的类 Worker、Runner、Agent,其中 Agent 继承 Runner,Agent 使用 Worker 作为执行步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
base_url='http://localhost:11434'
llm = Ollama(model="qwen2.5:latest", request_timeout=360.0,base_url=base_url)
Settings.llm = llm
Settings.embed_model = OllamaEmbedding(model_name="quentinz/bge-large-zh-v1.5:latest",base_url=base_url)

from llama_index.core.tools import FunctionTool
def multiply(a: float, b: float) -> float:
"""Multiply two numbers and returns the product"""
return a * b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
def add(a: float, b: float) -> float:
"""Add two numbers and returns the sum"""
return a + b
add_tool = FunctionTool.from_defaults(fn=add)

FunctionCallingAgent 与 ReActAgent

1
2
3
4
from llama_index.core.agent import FunctionCallingAgent
function_calling_agent=FunctionCallingAgent.from_tools(tools=[multiply_tool, add_tool],verbose=True)
response = function_calling_agent.chat("计算结果,1000+157*2?")
print(response)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> Running step f 1466 f 93-b 140-4 f 58-b 800-b 7221 e 1 fe 5 cd. Step input: 计算结果,1000+157*2?
Added user message to memory: 计算结果,1000+157*2?
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step 4 a 7 cdcf 8-97 b 5-4 e 1 b-9387-62734 c 68 de 34. Step input: None
=== Calling Function ===
Calling function: add with args: {"a": 1000, "b": 314}
=== Function Output ===
1314
> Running step 600 c 57 bf-c 783-49 fc-be 90-f 402 cd 35 e 78 a. Step input: None
=== LLM Response ===
计算结果是 \( 1000 + 157 \times 2 = 1314 \)。
计算结果是 \( 1000 + 157 \times 2 = 1314 \)。
1
2
3
4
5
from llama_index.core.agent import ReActAgent
# 创建代理
reAct_agent = ReActAgent.from_tools([multiply_tool, add_tool], verbose=True)
response = reAct_agent.chat("计算结果,1000+157*2?")
print(response)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
> Running step 13973598-2 e 38-4 cc 3-ba 70-0897 d 09 bb 55 c. Step input: 计算结果,1000+157*2?
Thought: The current language of the user is: Chinese. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 157, 'b': 2}
Observation: 314
> Running step eb 577013-9 af 9-4864-8516-ca 97 b 24 d 288 f. Step input: None
Thought: I can now perform the addition using the result from the multiplication.
Action: add
Action Input: {'a': 1000, 'b': 314}
Observation: 1314
> Running step e 75 c 52 b 9-109 a-4 fdc-b 555-6159 e 9 c 3 fb 5 c. Step input: None
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: 计算结果是 1314。
计算结果是 1314。
1
2
3
4
5
6
7
8
9
10
11
12
13
task = self.create_task(message)
result_output = None
dispatcher.event(AgentChatWithStepStartEvent(user_msg=message))
while True:
# pass step queue in as argument, assume step executor is stateless
cur_step_output = self._run_step(
task.task_id, mode=mode, tool_choice=tool_choice
)
if cur_step_output.is_last:
result_output = cur_step_output
break
# ensure tool_choice does not cause endless loops
tool_choice = "auto"
  1. FunctionCallingAgent 与 ReActAgent 的 chat 均是使用以上代码,看得出来是一个一直输出的过程,除非输出出现结束标志 (cur_step_output. Is_last=True)
  2. _run_step 步骤开始出现不同:
  • 两者都是分为 3 个步骤:(1) 执行一次 llm 推理;(2) 解析 llm 输出;(3) 生成下一步的 Task
  • 在执行 llm 推理时,FunctionCallingAgent 直接调用接口 chat_with_tools,而 ReActAgent 是 chat,也就是 FunctionCallingAgent 提供 tool 给 llm,而 ReActAgent 只是通过 prompt 提供
  • 解析 llm 输出时,通过查看 ReActAgent 的 prompt,可以看出其要求结构化输出,解析是提取输出 “Thought、Action、Action Input、Observation” 的不同输出
  • 根据解析的输出,规划下一步输出

functionCallingVSReActAgent

FunctionCallingAgent 没有自定义 prompt,而是直接提供 tool 供 llm 内部选择,ReActAgent 规范了 llm 的思考规则,通过选择工具回答问题,并规则输出格式

ReActAgent 的 prompt 包含 2 部分:
Tools:明确能使用的工具
Output Format:明确输出的格式,包含 2 个分支

  1. 如果是中间步骤,按照以下格式输出:
1
2
3
Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of {tool_names}) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})
  1. 如果是最终结果,按照以下格式之一输出
1
2
3
4
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: [your answer here (In the same language as the user's question)]
> Thought: I cannot answer the question with the provided tools.
Answer: [your answer here (In the same language as the user's question)]

可见 ReActAgent 是一个规范思维方式的,自定义程度比 FunctionCallingAgent 高的 Agent

StructuredPlannerAgent

1
2
3
4
5
6
7
8
9
10
11
12
13
from llama_index.core.agent import StructuredPlannerAgent,FunctionCallingAgentWorker
# 创建代理
# create the function calling worker for reasoning
worker = FunctionCallingAgentWorker.from_tools(
[multiply_tool, add_tool], verbose=True
)
# wrap the worker in the top-level planner
agent = StructuredPlannerAgent(
worker, tools=[multiply_tool, add_tool],
verbose=True,
memory=None
)
response = agent.chat("计算结果,1000+157*2?")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
=== Initial plan ===
计算乘法部分:
157 * 2 -> 314.0
Deps: []
计算加法部分:
1000 + 314.0 -> 1314.0
Deps: ['计算乘法部分']
> Running step 3 cd 2 cc 1 c-0 f 45-4 fb 1-aad 7-d 28 b 4 c 821 f 2 c. Step input: 157 * 2
Added user message to memory: 157 * 2
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step 95 f 3 f 0 a 6-8 ba 7-4 dbc-9 e 86-f 33118 f 69 ad 5. Step input: None
=== LLM Response ===
The product of 157 and 2 is 314.
=== Refined plan ===
计算加法部分:
1000 + 314.0 -> 1314.0
Deps: ['计算乘法部分']
完成任务:
-> 1314.0
Deps: ['计算加法部分']
> Running step abad 3 f 05-a 750-4 f 62-85 bd-28 db 75 fc 4 fb 6. Step input: 1000 + 314.0
Added user message to memory: 1000 + 314.0
=== Calling Function ===
Calling function: add with args: {"a": 1000, "b": 314}
=== Function Output ===
1314
> Running step 203 c 38 a 5-29 fd-4 bce-acac-5 c 3530 a 25 ed 3. Step input: None
=== LLM Response ===
The sum of 1000 and 314.0 is 1314.0.
=== Refined plan ===
验证最终结果:
-> 1314.0
Deps: ['计算加法部分']
> Running step bee 84 f 9 a-8 a 9 a-4555-94 e 9-e 6 ccf 2 deb 1 dc. Step input:
Added user message to memory:
=== LLM Response ===
Great! If you have any other calculations or questions, feel free to ask!
=== Refined plan ===
计算乘法部分:
Math.Multiply (157, 2) -> succeeded
Deps: []
计算加法部分:
Math.Add (1000, 314.0) -> succeeded
Deps: ['计算乘法部分']
验证最终结果:
-> Great! If you have any other calculations or questions, feel free to ask!
Deps: ['计算加法部分']

和前面两个 “边规划边执行不同”,还有一种思维方式,提前规划好,然后执行规划,得到最终答案, StructuredPlannerAgent 就是这种方式,该 Agent 整体运行逻辑如下

PlanningAgentRunner

  1. Create_plan:提出整体计划
  2. 执行:执行第一个任务
  3. 优化计划:根据上一任务结果及下一任务,调整优化计划

仔细地,还是从 prompt 了解其原理

1
2
3
4
5
6
7
8
DEFAULT_INITIAL_PLAN_PROMPT = """\
Think step-by-step. Given a task and a set of tools, create a comprehesive, end-to-end plan to accomplish the task.
Keep in mind not every task needs to be decomposed into multiple sub-tasks if it is simple enough.
The plan should end with a sub-task that satisfies the overall task.
The tools available are:
{tools_str}
Overall Task: {task}
"""

这是 StructuredPlannerAgent 的 create_plan 方法的 prompt,可以看出其作用是提出整体规划

还有一个优化计划的 prompt,作用是根据已执行的任务结果,更新后续任务

1
2
3
4
5
6
7
8
9
10
11
12
13
DEFAULT_PLAN_REFINE_PROMPT = """\
Think step-by-step. Given an overall task, a set of tools, and completed sub-tasks, update (if needed) the remaining sub-tasks so that the overall task can still be completed.
The plan should end with a sub-task that satisfies the overall task.
If the remaining sub-tasks are sufficient, you can skip this step.
The tools available are:
{tools_str}
Overall Task:
{task}
Completed Sub-Tasks + Outputs:
{completed_outputs}
Remaining Sub-Tasks:
{remaining_sub_tasks}
"""

创建初始任务和计划

以下根据 StructuredPlannerAgent 原理,使用低级 API 展示

1
2
3
4
5
6
7
plan_id = agent.create_plan("计算结果,1000+157*2?")
print('------------'*3)
plan = agent.state.plan_dict[plan_id]
for sub_task in plan.sub_tasks:
print(f"===== Sub Task {sub_task.name} =====")
print("Expected output: ", sub_task.expected_output)
print("Dependencies: ", sub_task.dependencies)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
=== Initial plan ===
Step 1: Multiply 157 by 2:
Multiply (157, 2) -> PENDING
Deps: []
Step 2: Add the result of Step 1 to 1000:
Add (1000, {result from Step 1}) -> PENDING
Deps: ['Step 1']
------------------------------------
===== Sub Task Step 1: Multiply 157 by 2 =====
Expected output: PENDING
Dependencies: []
===== Sub Task Step 2: Add the result of Step 1 to 1000 =====
Expected output: PENDING
Dependencies: ['Step 1']

执行第一组任务

1
2
3
4
5
6
7
8
9
10
# 获取下一步要执行的任务
next_tasks = agent.state.get_next_sub_tasks(plan_id)
for sub_task in next_tasks:
print(f"===== Sub Task {sub_task.name} =====")
print("Expected output: ", sub_task.expected_output)
print("Dependencies: ", sub_task.dependencies)
# 执行任务
for sub_task in next_tasks:
response = agent.run_task(sub_task.name)
agent.mark_task_complete(plan_id, sub_task.name)
1
2
3
4
5
6
7
8
9
10
11
12
===== Sub Task Step 1: Multiply 157 by 2 =====
Expected output: PENDING
Dependencies: []
> Running step 6008 e 40 b-5 e 57-471 f-9 a 0 d-94 c 422 c 5 c 9 be. Step input: multiply (157, 2)
Added user message to memory: multiply (157, 2)
=== Calling Function ===
Calling function: multiply with args: {"a": 157, "b": 2}
=== Function Output ===
314
> Running step dcf 22 d 0 b-c 100-4 a 88-83 ce-a 83 bbd 82 f 485. Step input: None
=== LLM Response ===
The product of 157 and 2 is 314.

查看是否结束

1
2
next_tasks = agent.get_next_tasks(plan_id)
print(len(next_tasks))

0

优化任务

1
2
3
4
5
# refine the plan
agent.refine_plan(
"计算结果,1000+157*2?",
plan_id,
)

=== Refined plan ===
Step 2: Add the result of Step 1 to 1000:
Add (1000, 314) -> PENDING
Deps: [‘Step 1’]