all search terms 2024 年 11 月 11 日

Using Language Models to Disambiguate Lexical Choices in Translation

all search terms dataset

title: Using Language Models to Disambiguate Lexical Choices in Translation

publish date:

2024-11-08

authors:

Josh Barua et.al.

paper id

2411.05781v1

download

2411.05781v1

abstracts:

In translation, a concept represented by a single word in a source language can have multiple variations in a target language. The task of lexical selection requires using context to identify which variation is most appropriate for a source text. We work with native speakers of nine languages to create DTAiLS, a dataset of 1,377 sentence pairs that exhibit cross-lingual concept variation when translating from English. We evaluate recent LLMs and neural machine translation systems on DTAiLS, with the best-performing model, GPT-4, achieving from 67 to 85% accuracy across languages. Finally, we use language models to generate English rules describing target-language concept variations. Providing weaker models with high-quality lexical rules improves accuracy substantially, in some cases reaching or outperforming GPT-4.

QA:

coming soon

编辑整理： wanghaisheng 更新日期：2024 年 11 月 11 日