Pregled bibliografske jedinice broj: 537344
Exploring Classification Concept Drift on a Large News Text Corpus
Exploring Classification Concept Drift on a Large News Text Corpus // Springer Lecture Notes in Computer Science, 7181 (2012), 1; 428-437 doi:10.1007/978-3-642-28604-9 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 537344 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Exploring Classification Concept Drift on a Large News Text Corpus
Autori
Šilić, Artur ; Dalbelo Bašić, Bojana
Izvornik
Springer Lecture Notes in Computer Science (0302-9743) 7181
(2012), 1;
428-437
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
text classification; concept drift; logistic regression
Sažetak
Concept drift research has regained research interest during recent years as many applications use data sources that are changing over time. We study the classification task using logistic regression on a large news collection of 248K texts during a period of seven years. We present extrinsic methods of concept drift detection and quantification using training set formation with different windowing techniques. On our corpus, we characterize concept drift and show the overestimation of classifier performance if it is neglected. We lay out paths for future work where we plan to refine extrinsic characterization methods and investigate the drifting of learning parameters when few examples are available.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus
Uključenost u ostale bibliografske baze podataka::
- Science Citation Index Expanded