A fusion approach for classifying duplicate problem reports

Banerjee, Sean; Syed, Zahid; Helmick, Jordan; Čukić, Bojan

Pregled bibliografske jedinice broj: 677931

A fusion approach for classifying duplicate problem reports

Banerjee, Sean; Syed, Zahid; Helmick, Jordan; Čukić, Bojan

A fusion approach for classifying duplicate problem reports // Proc. of the 24th International Symposium on Software Reliability Engineering (ISSRE 2013 / unknown (ur.).
New York (NY): Institute of Electrical and Electronics Engineers (IEEE), 2013. str. 208-217 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 677931 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A fusion approach for classifying duplicate problem reports

Autori
Banerjee, Sean ; Syed, Zahid ; Helmick, Jordan ; Čukić, Bojan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proc. of the 24th International Symposium on Software Reliability Engineering (ISSRE 2013 / Unknown - New York (NY) : Institute of Electrical and Electronics Engineers (IEEE), 2013, 208-217

ISBN
0-8186-7131-9

Skup
IEEE 24th International Symposium on Software Reliability Engineering (ISSRE 2013

Mjesto i datum
Pasadena (CA), Sjedinjene Američke Države, 04.11.2013. - 07.11.2013

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
duplicate problem report classification

Sažetak
Issue tracking systems play a critical role in software maintenance by allowing users and developers to submit problem reports for observed failures. A major problem in these systems is that two or more users can, and do, submit reports describing the same issue. Automated classification of such duplicate problem reports is an area of active research. The corpus of existing research shows a slow improvement in classification accuracy using relatively small subsets of problem report data. When applied to an entire project's problem repository, they exhibit a reduction in performance. In this paper we propose a novel duplicate report detection approach using multi-label classification. We use a suite of 24 duplicate classification techniques and MULAN software package to train a multi-label classifier. This multi-label classifier selects a set of similarity measures (from a pool of measures) that are most likely to find the true primary report. To demonstrate its effectiveness the method was tested on the entire Firefox repository. This data set encompasses 12+ years of problem reports and contains over 30, 000 duplicate reports. Our results indicate that multi-label classification boosts the performance of the individual measures by up to 40% while returning overall results that match or outperform existing methods. The proposed method uses less than 1% of the dataset for training.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Projekti:
165-0362980-2002 - Postupci raspoređivanja u samoodrživim raspodijeljenim računalnim sustavima (Martinović, Goran, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek

Profili:

Bojan Čukić (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 677931

A fusion approach for classifying duplicate problem reports

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 677931

A fusion approach for classifying duplicate problem reports

Citiraj ovu publikaciju:

Podijeli: