使用 LlamaIndex 评估检索效果

构建索引的方式很多,比如向量索引、关键字索引,检索的方式也很多,比如稀疏检索、文本检索,不同的方式检索到的文档不一样,按照工程的方法,在检索后,对检索的效果进行评估

Step 1:基于 nodes 构建测试集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
DEFAULT_QA_GENERATE_PROMPT_TMPL = """\
Context information is below.

---------------------
{context_str}
---------------------

Given the context information and not prior knowledge.
generate only questions based on the below query.

You are a Teacher/ Professor. Your task is to setup \
{num_questions_per_chunk} questions for an upcoming \
quiz/examination. The questions should be diverse in nature \
across the document. Restrict the questions to the context information provided.\
最终提问用中文生成
"""

qa_dataset=generate_qa_embedding_pairs(nodes=nodes,qa_generate_prompt_tmpl=DEFAULT_QA_GENERATE_PROMPT_TMPL,num_questions_per_chunk=2)
qa_dataset.save_json(save_json_path)

测试集效果如下:

使用LlamaIndex评估检索效果-20241223105814

Step 2:测试并输出测试效果

首先通过 nodes 构建索引 index,然后在测试集上评估检索效果

1
2
3
4
5
6
7
8
9
10
11
12
async def MyRetrieverEvaluator(index,qa_dataset):
retriever = index.as_retriever(similarity_top_k=3)

metrics = ["hit_rate", "mrr", "precision", "recall", "ap", "ndcg"]
retriever_evaluator = RetrieverEvaluator.from_metric_names(
metrics, retriever=retriever
)

# try it out on an entire dataset
eval_results = await retriever_evaluator.aevaluate_dataset(qa_dataset,show_progress=True)

return (eval_results,metrics)