all search terms
2024 年 10 月 21 日
RAGConfusionQA A Benchmark for Evaluating LLMs on Confusing Questions
title: RAGConfusionQA A Benchmark for Evaluating LLMs on Confusing Questions
publish date:
2024-10-18
authors:
Zhiyuan Peng et.al.
paper id
2410.14567v1
download
abstracts:
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries. However, many natural questions do not have good answers: about 25% contain false assumptions~\cite{Yu2023:CREPE}, and over 50% are ambiguous~\cite{Min2020:AmbigQA}. RAG agents need high-quality data to improve their responses to confusing questions. This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus. We conduct an empirical comparative evaluation of several large language models as RAG agents to measure the accuracy of confusion detection and appropriate response generation. We contribute a benchmark dataset to the public domain.
QA:
coming soon
编辑整理: wanghaisheng 更新日期:2024 年 10 月 21 日