Pregled bibliografske jedinice broj: 633588
Frequently Asked Questions Retrieval for Croatian Based on Semantic Textual Similarity
Frequently Asked Questions Retrieval for Croatian Based on Semantic Textual Similarity // Procedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing
Sofija: Association for Computational Linguistics (ACL), 2013. str. 24-33 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 633588 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Frequently Asked Questions Retrieval for Croatian Based on Semantic Textual Similarity
Autori
Karan, Mladen ; Žmak, Lovro ; Šnajder, Jan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Procedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing
/ - Sofija : Association for Computational Linguistics (ACL), 2013, 24-33
Skup
4th Biennial International Workshop on Balto-Slavic Natural Language Processing
Mjesto i datum
Sofija, Bugarska, 08.08.2013. - 09.08.2013
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
FAQ retrieval ; information retrieval ; semantic textual similarity ; Croatian language
Sažetak
Frequently asked questions (FAQ) are an efficient way of communicating domain-specific information to the users. Unlike general purpose retrieval engines, FAQ retrieval engines have to address the lexical gap between the query and the usually short answer. In this paper we describe the design and evaluation of a FAQ retrieval engine for Croatian. We frame the task as a binary classification problem, and train a model to classify each FAQ as either relevant or not relevant for a given query. We use a variety of semantic textual similarity features, including term overlap and vector space features. We train and evaluate on a FAQ test collection built specifically for this purpose. Our best-performing model reaches 0.47 of mean reciprocal rank, i.e., on average ranks the relevant answer among the top two returned answers.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb