Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval

Karan, Mladen; Šnajder, Jan

izvor podataka: crosbi ✓

Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval (CROSBI ID 244022)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Karan, Mladen ; Šnajder, Jan Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval // Expert systems with applications, 91 (2018), 418-433. doi: 10.1016/j.eswa.2017.09.031

Podaci o odgovornosti

Autori

Karan, Mladen ; Šnajder, Jan

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval

Sažetak

A frequently asked questions (FAQ) retrieval system improves the access to information by allowing users to pose natural language queries over an FAQ collection. From an information retrieval perspective, FAQ retrieval is a challenging task, mainly because of the lexical gap that exists between a query and an FAQ pair, both of which are typically very short. In this work, we explore the use of supervised learning to rank to improve the performance of domain-specific FAQ retrieval. While supervised learning-to-rank models have been shown to yield effective retrieval performance, they require costly human-labeled training data in the form of document relevance judgments or question paraphrases. We investigate how this labeling effort can be reduced using a labeling strategy geared toward the manual creation of query paraphrases rather than the more time-consuming relevance judgments. In particular, we investigate two such strategies, and test them by applying supervised ranking models to two domain-specific FAQ retrieval data sets, showcasing typical FAQ retrieval scenarios. Our experiments show that supervised ranking models can yield significant improvements in the precision-at- rank-5 measure compared to unsupervised baselines. Furthermore, we show that a supervised model trained using data labeled via a low-effort paraphrase- focused strategy has the same performance as that of the same model trained using fully labeled data, indicating that the strategy is effective at reducing the labeling effort while retaining the performance gains of the supervised approach. To encourage further research on FAQ retrieval we make our FAQ retrieval data set publicly available.

Ključne riječi

question answering ; FAQ retrieval ; learning to rank ; ListNET ; LambdaMART ; convolutional neural network

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Expert systems with applications

Volumen (broj)

Godina

2018.

Stranice rada

418-433

Status objave rada

objavljeno

ISSN

0957-4174

e-ISSN

1873-6793

DOI

10.1016/j.eswa.2017.09.031

Povezanost rada

Povezane osobe

Mladen Karan (autor/i)

Jan Šnajder (autor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Područje

Računarstvo

Poveznice

doi.org

sciencedirect.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)