使用 LlamaIndex 评估检索效果

发表于 2025-01-10 更新于 2025-02-02 分类于 2-深度学习， LLM开发工程师指南， LlamaIndex 阅读次数：本文字数： 1.3k 阅读时长 ≈ 1 分钟

构建索引的方式很多，比如向量索引、关键字索引，检索的方式也很多，比如稀疏检索、文本检索，不同的方式检索到的文档不一样，按照工程的方法，在检索后，对检索的效果进行评估

Step 1：基于 nodes 构建测试集

DEFAULT_QA_GENERATE_PROMPT_TMPL = """\
Context information is below.

---------------------
{context_str}
---------------------

Given the context information and not prior knowledge.
generate only questions based on the below query.

You are a Teacher/ Professor. Your task is to setup \
{num_questions_per_chunk} questions for an upcoming \
quiz/examination. The questions should be diverse in nature \
across the document. Restrict the questions to the context information provided.\
最终提问用中文生成
"""

qa_dataset=generate_qa_embedding_pairs(nodes=nodes,qa_generate_prompt_tmpl=DEFAULT_QA_GENERATE_PROMPT_TMPL,num_questions_per_chunk=2)
qa_dataset.save_json(save_json_path)

测试集效果如下：

使用LlamaIndex评估检索效果-20241223105814

Step 2：测试并输出测试效果

首先通过 nodes 构建索引 index，然后在测试集上评估检索效果

async def MyRetrieverEvaluator(index,qa_dataset):
    retriever = index.as_retriever(similarity_top_k=3)

    metrics = ["hit_rate", "mrr", "precision", "recall", "ap", "ndcg"]
    retriever_evaluator = RetrieverEvaluator.from_metric_names(
        metrics, retriever=retriever
    )

    # try it out on an entire dataset
    eval_results = await retriever_evaluator.aevaluate_dataset(qa_dataset,show_progress=True)

    return (eval_results,metrics)