Pregled bibliografske jedinice broj: 1061402
Corpus-Based Paraphrase Detection Experiments and Review
Corpus-Based Paraphrase Detection Experiments and Review // Information, 11 (2020), 5; 241, 24 doi:10.3390/info11050241 (međunarodna recenzija, pregledni rad, znanstveni)
CROSBI ID: 1061402 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Corpus-Based Paraphrase Detection Experiments and
Review
Autori
Vrbanec, Tedo ; Meštrović, Ana
Izvornik
Information (2078-2489) 11
(2020), 5;
241, 24
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, pregledni rad, znanstveni
Ključne riječi
semantic similarity ; deep learning ; paraphrasing corpora ; experiments ; natural language processing
Sažetak
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub- model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the- art approaches and have potential that should be further developed.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
NadSve-Sveučilište u Rijeci-uniri-drustv-18-38 - Postupci mjerenja semantičke sličnosti tekstova (SemText) (Meštrović, Ana, NadSve - Natječaj za dodjelu sredstava potpore znanstvenim istraživanjima na Sveučilištu u Rijeci za 2018. godinu - projekti iskusnih znanstvenika i umjetnika) ( CroRIS)
Ustanove:
Učiteljski fakultet, Zagreb,
Fakultet informatike i digitalnih tehnologija, Rijeka
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Emerging Sources Citation Index (ESCI)
- Scopus
Uključenost u ostale bibliografske baze podataka::
- EI Compendex