Effect of source-code preprocessing techniques on plagiarism detection accuracy in student programming assignments (CROSBI ID 438531)
Ocjenski rad | doktorska disertacija
Podaci o odgovornosti
Novak, Matija
Kermek, Dragutin ; Joy, Mike
engleski
Effect of source-code preprocessing techniques on plagiarism detection accuracy in student programming assignments
Plagiarism is a serious problem in academia and students cheat for various reasons, but whateverthereasonsuchbehaviourshouldnotbeaccepted. Whileitiseasytocontrolplagiarism in classrooms with few students it can be a challenge to do it in a classroom with one hundred students or more. To help teacher detect plagiarism similarity detection tools are built, usually called plagiarism detection tools. While in academia plagiarism can be done in many areas the two most common are textual and programming assignments. In this thesis, the focus is on detecting plagiarism in student programming assignments. Since the tools are not perfect there is always room for improvement and one possibility to improve the plagiarism detection quality is the usage of preprocessing techniques. Preprocessing techniques have been used in many plagiarism detection tools but there is not much research focusing on the effects of such techniques. To investigate the effect of preprocessing techniques on plagiarism detection tools an experiment was conducted on six tools using five techniques on two different datasets, whereby one dataset is publicly available. To be more precise the six tools were actually three tools whereby each tool had two modes to operatethespecializedmodewhichisspeciallydeveloped toperformasource-codecomparison and textual mode developed for normal text comparison. In this experiment two hypotheses were stated, one focusing on the differences between the preprocessing techniques and when no preprocessing technique is used and other focusing on differences between two different techniques. In addition to the hypothesis one research question was stated to give more insight into the effects of the preprocessing techniques. Resultsoftheexperimentwereanalysedquantitativelyus ingthemultifactoranalysisofvariance and qualitatively by analysing the most interesting cases. The whole process of detection and statistical analysis was automated using the newly developed system called Multiple Plagiarism Checker and the system R. The experimental results confirmed both hypotheses showing that using preprocessing has a positive effect on the quality of plagiarism detection and that some techniques gave better results than others. The most interesting result of this research is that by using preprocessing techniques textual versions of the tools outperformed in some cases the specialized version of the tool developed specifically for source-code similarity detection.
plagiarsim detection ; high education ; source-code ; programming assignements ; preprocessing tehniques ; comparison ; program similarity
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
232
03.02.2020.
obranjeno
Podaci o ustanovi koja je dodijelila akademski stupanj
Fakultet organizacije i informatike
Varaždin