Empirical Study: How Issue Classification Influences Software Defect Prediction

Afrić, Petar; Vukadin, Davor; Šilić, Marin; Delač, Goran

izvor podataka: crosbi ✓

Empirical Study: How Issue Classification Influences Software Defect Prediction (CROSBI ID 321009)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Afrić, Petar ; Vukadin, Davor ; Šilić, Marin ; Delač, Goran Empirical Study: How Issue Classification Influences Software Defect Prediction // IEEE access, 11 (2023), 11732-11748. doi: 10.1109/ACCESS.2023.3242045

Podaci o odgovornosti

Autori

Afrić, Petar ; Vukadin, Davor ; Šilić, Marin ; Delač, Goran

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Empirical Study: How Issue Classification Influences Software Defect Prediction

Sažetak

Software defect prediction aims to identify potentially defective software modules to better allocate limited quality assurance resources. Practitioners often do this by utilizing supervised models trained using historical data. This data is gathered by mining version control and issue tracking systems. Version control commits are linked to issues they address. If the linked issue is classified as a bug report, the change is considered as bug fixing. The problem arises from the fact that issues are often incorrectly classified within issue tracking systems. This introduces noise into the gathered datasets. In this paper, we investigate the influence issue classification has on software defect prediction dataset quality and resulting model performance. To do this, we mine data from 7 popular open-source repositories, create issue classification and software defect prediction datasets for each of them. We investigate issue classification using four different methods ; a simple keyword heuristic, an improved keyword heuristic, the FastText model and the RoBERTa model. Our results show that using the RoBERTa model for issue classification produces the best software defect prediction datasets, containing on average 14.3641% of mislabeled instances. SDP models trained on such datasets achieve superior performance, to those trained on SDP datasets created using other issue classification methods, in 65 out of 84 experiments, with 55 of them being statistically relevant. Furthermore, in 17 out of 28 experiments we could not show a statistically relevant performance difference between SDP models trained on RoBERTa derived software defect prediction datasets and those created using manually labeled issues.

Ključne riječi

Issue tracking ; Version Control Systems ; Natural language processing , Issue classification ; Software defect prediction ; RoBERTa

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

IEEE access

Volumen (broj)

Godina

2023.

Stranice rada

11732-11748

Status objave rada

objavljeno

e-ISSN

2169-3536

DOI

10.1109/ACCESS.2023.3242045

Povezanost rada

Povezane osobe

Petar Afrić (autor/i)

Davor Vukadin (autor/i)

Marin Šilić (autor/i)

Goran Delač (autor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Povezani projekti

Pouzdani kompozitni primjenski sustavi zasnovani na web uslugama (rezultat rada na projektu)

Područje

Računarstvo

Poveznice

doi.org

ieeexplore.ieee.org

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)