Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 524149

N-gram overlap in automatic detection of document derivation


Bosanac, Siniša; Štefanec, Vanja
N-gram overlap in automatic detection of document derivation // INFuture2011: The Future of Information Sciences - Information Sciences and e-Society / Billenness, Clive ; Hemera, Annette ; Mateljan, Vladimir ; Banek Zorica, Mihaela ; Stančić, Hrvoje ; Seljan, Sanja (ur.).
Zagreb: Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu, 2011. str. 373-382


CROSBI ID: 524149 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
N-gram overlap in automatic detection of document derivation

Autori
Bosanac, Siniša ; Štefanec, Vanja

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, pregledni

Knjiga
INFuture2011: The Future of Information Sciences - Information Sciences and e-Society

Urednik/ci
Billenness, Clive ; Hemera, Annette ; Mateljan, Vladimir ; Banek Zorica, Mihaela ; Stančić, Hrvoje ; Seljan, Sanja

Izdavač
Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu

Grad
Zagreb

Godina
2011

Raspon stranica
373-382

ISBN
978-953-175-408-8

Ključne riječi
document derivation, text reuse, n-gram overlap, automatic plagiarism detection, string metrics

Sažetak
Establishing authenticity and independence of documents in relation to others is not a new problem, but in the era of hyper production of e-text it certainly gained even more importance. There is an increased need for automatic methods for determining originality of documents in a digital environment. The method of n-gram overlap is only one of several methods proposed by the literature and is used in a variety of systems for automatic identification of text reuse. Although the aforementioned method is quite trivial, determining the length of n-grams that would be a good indicator of text reuse is a somewhat complex issue. We assume that the optimal length of n-grams is not the same for all languages but that it depends on the particular language properties such as morphological typology, syntactic features, etc. The aim of this study is to find the optimal length of n-grams to be used for determining document derivation in Croatian language. Among the potential areas of implementation of the results of this study, we could point out automatic detection of plagiarism in academic and student papers, citation analysis, information flow tracking and event detection in on-line texts.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Vanja Štefanec (autor)

Citiraj ovu publikaciju:

Bosanac, Siniša; Štefanec, Vanja
N-gram overlap in automatic detection of document derivation // INFuture2011: The Future of Information Sciences - Information Sciences and e-Society / Billenness, Clive ; Hemera, Annette ; Mateljan, Vladimir ; Banek Zorica, Mihaela ; Stančić, Hrvoje ; Seljan, Sanja (ur.).
Zagreb: Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu, 2011. str. 373-382
Bosanac, S. & Štefanec, V. (2011) N-gram overlap in automatic detection of document derivation. U: Billenness, C., Hemera, A., Mateljan, V., Banek Zorica, M., Stančić, H. & Seljan, S. (ur.) INFuture2011: The Future of Information Sciences - Information Sciences and e-Society. Zagreb, Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu, str. 373-382.
@inbook{inbook, author = {Bosanac, Sini\v{s}a and \v{S}tefanec, Vanja}, year = {2011}, pages = {373-382}, keywords = {document derivation, text reuse, n-gram overlap, automatic plagiarism detection, string metrics}, isbn = {978-953-175-408-8}, title = {N-gram overlap in automatic detection of document derivation}, keyword = {document derivation, text reuse, n-gram overlap, automatic plagiarism detection, string metrics}, publisher = {Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Zagreb} }
@inbook{inbook, author = {Bosanac, Sini\v{s}a and \v{S}tefanec, Vanja}, year = {2011}, pages = {373-382}, keywords = {document derivation, text reuse, n-gram overlap, automatic plagiarism detection, string metrics}, isbn = {978-953-175-408-8}, title = {N-gram overlap in automatic detection of document derivation}, keyword = {document derivation, text reuse, n-gram overlap, automatic plagiarism detection, string metrics}, publisher = {Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Zagreb} }




Contrast
Increase Font
Decrease Font
Dyslexic Font