An Iterative Semi-supervised Approach to Software Fault Prediction

Lu, Huihua; Cukic, Bojan; Culp, Mark

izvor podataka: crosbi !

An Iterative Semi-supervised Approach to Software Fault Prediction (CROSBI ID 582481)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Lu, Huihua ; Cukic, Bojan ; Culp, Mark An Iterative Semi-supervised Approach to Software Fault Prediction // Proceedings of the 7th International Conference on Predictive Models in Software Engineering / Menzies, Tim (ur.). New York (NY): The Association for Computing Machinery (ACM), 2011. str. 1-15

Podaci o odgovornosti

Autori

Lu, Huihua ; Cukic, Bojan ; Culp, Mark

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

An Iterative Semi-supervised Approach to Software Fault Prediction

Sažetak

Background: Many statistical and machine learning techniques have been implemented to build predictive fault models. Traditional methods are based on supervised learning. Software metrics for a module and corresponding fault information, available from previous projects, are used to train a fault prediction model. This approach calls for a large size of training data set and enables the development of effective fault prediction models. In practice, data collection costs, the lack of data from earlier projects or product versions may make large fault prediction training data set unattainable. Small size of the training set that may be available from the current project is known to deteriorate the performance of the fault predictive model. In semi-supervised learning approaches, software modules with known or unknown fault content can be used for training. Aims: To implement and evaluate a semi-supervised learning approach in software fault prediction. Methods: We investigate an iterative semi-supervised approach to software quality prediction in which a base supervised learner is used within a semi-supervised application. Results: We varied the size of labeled software modules from 2% to 50% of all the modules in the project. After tracking the performance of each iteration in the semi-supervised algorithm, we observe that semi-supervised learning improves fault prediction if the number of initially labeled software modules exceeds 5%. Conclusion: The semi-supervised approach outperforms the corresponding supervised learning approach when both use random forest as base classification algorithm.

Ključne riječi

Metrics-complexity measures; performance

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

1-15.

Godina izdavanja

2011.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Urednici

Menzies, Tim

Izdavač

New York (NY): The Association for Computing Machinery (ACM)

ISBN

978-1-4503-0709-3

Podaci o skupu

Skup

7th ACM International Conference on Predictive Models in Software Engineering

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

20.09.2011-21.09.2011

Mjesto održavanja skupa

Banff, Kanada

Povezanost rada

Povezane osobe

Bojan Čukić (autor/i)

Povezane ustanove

Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek (165) (autorova ustanova)

Povezani projekti

Postupci raspoređivanja u samoodrživim raspodijeljenim računalnim sustavima (rezultat rada na projektu)

Područje

Računarstvo