Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Effect of source-code preprocessing techniques on plagiarism detection accuracy in student programming assignments (CROSBI ID 438531)

Ocjenski rad | doktorska disertacija

Novak, Matija Effect of source-code preprocessing techniques on plagiarism detection accuracy in student programming assignments / Kermek, Dragutin ; Joy, Mike (mentor); Varaždin, Fakultet organizacije i informatike, . 2020

Podaci o odgovornosti

Novak, Matija

Kermek, Dragutin ; Joy, Mike

engleski

Effect of source-code preprocessing techniques on plagiarism detection accuracy in student programming assignments

Plagiarism is a serious problem in academia and students cheat for various reasons, but whateverthereasonsuchbehaviourshouldnotbeaccepted. Whileitiseasytocontrolplagiarism in classrooms with few students it can be a challenge to do it in a classroom with one hundred students or more. To help teacher detect plagiarism similarity detection tools are built, usually called plagiarism detection tools. While in academia plagiarism can be done in many areas the two most common are textual and programming assignments. In this thesis, the focus is on detecting plagiarism in student programming assignments. Since the tools are not perfect there is always room for improvement and one possibility to improve the plagiarism detection quality is the usage of preprocessing techniques. Preprocessing techniques have been used in many plagiarism detection tools but there is not much research focusing on the effects of such techniques. To investigate the effect of preprocessing techniques on plagiarism detection tools an experiment was conducted on six tools using five techniques on two different datasets, whereby one dataset is publicly available. To be more precise the six tools were actually three tools whereby each tool had two modes to operatethespecializedmodewhichisspeciallydeveloped toperformasource-codecomparison and textual mode developed for normal text comparison. In this experiment two hypotheses were stated, one focusing on the differences between the preprocessing techniques and when no preprocessing technique is used and other focusing on differences between two different techniques. In addition to the hypothesis one research question was stated to give more insight into the effects of the preprocessing techniques. Resultsoftheexperimentwereanalysedquantitativelyus ingthemultifactoranalysisofvariance and qualitatively by analysing the most interesting cases. The whole process of detection and statistical analysis was automated using the newly developed system called Multiple Plagiarism Checker and the system R. The experimental results confirmed both hypotheses showing that using preprocessing has a positive effect on the quality of plagiarism detection and that some techniques gave better results than others. The most interesting result of this research is that by using preprocessing techniques textual versions of the tools outperformed in some cases the specialized version of the tool developed specifically for source-code similarity detection.

plagiarsim detection ; high education ; source-code ; programming assignements ; preprocessing tehniques ; comparison ; program similarity

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

232

03.02.2020.

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Fakultet organizacije i informatike

Varaždin

Povezanost rada

Informacijske i komunikacijske znanosti

Poveznice