Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Corpus-Based Paraphrase Detection Experiments and Review (CROSBI ID 278542)

Prilog u časopisu | pregledni rad (znanstveni) | međunarodna recenzija

Vrbanec, Tedo ; Meštrović, Ana Corpus-Based Paraphrase Detection Experiments and Review // Information, 11 (2020), 5; 241, 24. doi: 10.3390/info11050241

Podaci o odgovornosti

Vrbanec, Tedo ; Meštrović, Ana

engleski

Corpus-Based Paraphrase Detection Experiments and Review

Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub- model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the- art approaches and have potential that should be further developed.

semantic similarity ; deep learning ; paraphrasing corpora ; experiments ; natural language processing

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

11 (5)

2020.

241

24

objavljeno

2078-2489

10.3390/info11050241

Trošak objave rada u otvorenom pristupu

Povezanost rada

Informacijske i komunikacijske znanosti, Računarstvo

Poveznice
Indeksiranost