title: ASR Benchmarking Need for a More Representative Conversational Dataset

publish date:

2024-09-18

authors:

Gaurav Maheshwari et.al.

paper id

2409.12042v1

download

abstracts:

Automatic Speech Recognition (ASR) systems have achieved remarkable performance on widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do not adequately reflect the complexities of real-world conversational environments, where speech is often unstructured and contains disfluencies such as pauses, interruptions, and diverse accents. In this study, we introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults. Our results show a significant performance drop across various state-of-the-art ASR models when tested in conversational settings. Furthermore, we observe a correlation between Word Error Rate and the presence of speech disfluencies, highlighting the critical need for more realistic, conversational ASR benchmarks.

QA:

coming soon

编辑整理: wanghaisheng 更新日期:2024 年 9 月 23 日