ࡱ> IKH5@<bjbj22*>XX;b b b b < 8 @ @ @ @ @ @ @ $R@9 @ @  @ @ hhh Z@ @ h hPhhG@ ̚ɼb fw083v G$G0@ v Th DN@ @ @ ^ ^ ^  FILLIN \* MERGEFORMAT Data Mining of Official Data Bases  FILLIN \* MERGEFORMAT Mirjana Pejic-Bach,  FILLIN \* MERGEFORMAT Ksenija Dumicic University of Zagreb,  FILLIN \* MERGEFORMAT Faculty of Economics Trg J.F.Kennedya 6 10000 Zagreb, Croatia mpejic@efzg.hr, kdumicic@efzg.hr  FILLIN \* MERGEFORMAT Introduction Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (Hand et al., 2001). Data mining techniques exist for a number of years and its roots are traced back along three family lines: classical statistics, artificial intelligence, and machine learning. In the last ten years, data mining has become one of the most popular hypes of the business world. However, public organizations only recently bring up use of data mining (Cahlink, 2000; Carbone, 1998). Just like the business organizations, with the widespread use of information systems that include databases (Dumicic, 1999) which have recently featured explosive growth in their sizes, public decision makers are faced with a problem of making use of the stored data. Goal of this paper is to investigate the possibility of using data mining to explore official data bases as a tool for improving efficiency of public organizations. Data mining applications using official data bases Applications of data mining using official data bases for public organizations are found from the Internet search (Google) with the use of words: data mining, government, and public. Following databases are also searched: Emerald, EBSCOhost, Proquest, Science Direct, Springer Verlag, Kluwer, Engineering Village 2 & Compendex, Wiley Interscience, and ProQuest Digital Dissertations. One should be aware that it is not possible to find every single data mining application in public organizations by the search of the scientific data bases or Internet. However, the presented survey can give substantial insight into the current practice of data mining in public organizations. We found 34 applications of data mining in public organizations. The oldest application is described in 1996. However, most of the applications (64,5%) are described in articles published in 2003, and we can conclude that application of data mining in public organizations grows exponentially. Finance and economy (29%) had the largest number of applications followed by healthcare (24%), criminal justice and defence (24%). Other areas that we have examined are labour and social welfare (6%), e-government (6%), education (9%) and transport (3%), but all of them have rather small number of applications. Applications are described at the business web sites, news web sites, scientific journals, and working papers. Most of the applications (62%) are described at business web sites, and the leader is SPSS followed by IBM. It should be emphasized that only particular applications, and not advertisements, described at their web sites are taken into account. Only 21% of applications are described in scientific journals, and 9% are found on the news web site or are described in the working paper. Method used is described at only 18 sources. Classification and prediction is most often used (44%), and is followed by evolution analysis (22%), concept/class description (17%) and outlier analysis (6%). Other methods like association analysis are not described to be used. Examples of applications US government tax agencies use Clementine and Intelligent Miner to build a predictive model that could improve collections management and audit selection by answering questions such as "Who is likely to become delinquent and by how much?" and "Which tax returns are likely to be non-compliant?". Neural networks have provided valuable insights for analysts forecasting tax revenues, which are critically important since agency budgets, support for education, and improvements to infrastructure all depend on their accuracy. Data mining is often used in detecting health care fraud. IBM Fraud and Abuse Management System is used for detecting health care fraud and abuse which ranks as one of the nations leading law enforcement frustrations in USA. Defense Advanced Research Projects Agency (DARPA) is developing the database called the Total Information Awareness System as part of its effort to track terrorists and their activities. However, many critics feel that the project is a threat for security because such a massive database would be very attractive for hackers to go after. Census data mining Applications often use census data, which is one the most comprehensive databases. Often spatial data analysis is employed, with the common applications in the social sciences that range from the discovery of crime clusters, hot spots and the detection of disease clusters, to spatial autocorrelation of demographic variables and regression models for real estate analysis. Other than described, possible uses of census data are: (1) business statistics with special mentions for innovation policy, and financial health, (2) household equipments and savings, (3) health statistics (mortality and morbidity) in order to detect unexpected risk factors, and (4) analysis of metadata information by means of text mining (Saporta, 2000). Conclusions This paper has reviewed data mining of official data bases. Readers should be cautious in interpreting the results of the survey, since the findings are based on data collected from the business web sites, journal articles, news web sites and working papers. Such approach is employed because data mining applications in public organizations are still rarely described in journal articles. However, we feel that even such a survey can describe the current state of the issue. ReferenceS Hand, D., Mannila, H., Smyth, P. (2001) Principles of Data Mining. Cambridge, MA. The MIT Press. Cahlink, G. Data Mining Taps the Trends. (2000) Government Executive Magazine.  HYPERLINK "http://207.27.3.29/tech/articles/1000managetech.htm" http://207.27.3.29/tech/articles/1000managetech.htm Carbone, P.L. (1998) Data Mining and the Government: Is There a Unique Challenge? The On-Line Executive Journal for Data-Intensive Decision Support.  HYPERLINK "http://www.tgc.com/dsstar/98/0519/980519.html" http://www.tgc.com/dsstar/98/0519/980519.html Saporta, G. (2000). Data Mining and Official Statistics. Quinta Conferenza Nationale di Statistica, ISTAT, Roma, 35-39. Dumi i, S, and Dumi i, K. (1999). Experience on Automated Coding of Occupation in Population Census in Croatia. Bulletin of the 52nd Session of the International Statistical Institute, Proceedings of the Contributed Papers, Tome LVIII, CD, Helsinki, Finland. http://www.stat.fi/isi99/proceedings/arkisto/contributed.html RSUM Lobjectif de cette tude est de prsenter les diffrentes applications de la recherche des donnes dans les bases de donnes officielles pour les organisations publiques. La recherche des bases de donnes scientifiques et lInternet a dcouvert que la majorit des applications est dcrite en anne en cours sur les web sites daffaires. La finance et lconomie, la sant publique, la justice criminelle et la dfense sont l<=>?WXjkmn  * + , D E Q R S ! ) *  ) ž}rghK1hp}ImH sH hK1h(mH sH hK1h(aJmH sH hK1haJmH sH hK1hnaJmH sH hK1hAaJmH sH hK1h.aJmH sH  hK1hM hK1hsjhK1hsUhK1hsmHsHhK1h.mHsH hK1h.jhK1h.U(>+ S _ ` b%%%%%%%s%%%%%%%%%%%% 7$8$H$`gd0}gdpQ`gdA`gd`gdS&gdN'-DM `gd.`gdfgdgds<) _ ` $PW<abu{$)7<GLZ_qŸwwwwwwwjhK1h_aJmH sH hK1h=aJmH sH hK1haJmH sH hK1h{XaJmH sH hK1hRaJmH sH hK1hnaJmH sH hK1hS&aJmH sH  hK1hN'hK1hN'CJaJmH sH hK1haJmH sH hK1h.aJmH sH hK1h(mH sH & $*_u<7=@kٿٿٿٿٿ̘̑}pcphK1hdaJmH sH hK1h0}aJmH sH hK1htaJmH sH  hK1h!C hK1hwhK1hAaJmH sH hK1hN'aJmH sH hK1hkaJmH sH hK1hVVaJmH sH hK1haJmH sH hK1hRaJmH sH hK1h{XaJmH sH hK1hitaJmH sH "%eķzmz`S`F9hK1h/aJmH sH hK1hAaJmH sH hK1hcaJmH sH hK1h aJmH sH hK1hbaJmH sH hK1h>yaJmH sH hK1hcmH sH hK1hBmH sH hK1hBaJmH sH hK1h63aJmH sH hK1h<$aJmH sH  hK1hdhK1h0}5aJmH sH hK1hnaJmH sH hK1h0}aJmH sH hK1h+aJmH sH G !!D$E$L$<%%%%%%%%%%s%s%s%%s%%%gdF 7^7`gd,gd gdA$a$gdZ`gd<$`gdGT\gdAgddgd0}`gd0} 56EFGpw|     ž|q_|N|qChK1h_CJaJ!hK1h10JB*CJaJph#jhK1h1CJUaJhK1h1CJaJjhK1h1CJUaJhK1h(}CJaJ hK1hg*hK1h~CJaJhK1h'--CJaJhK1hg*CJaJ hK1h.hK1h<$aJmH sH hK1hAaJmH sH hK1haJmH sH hK1hGT\aJmH sH  hK1hA  % !!!!K!M!!"4"6""# ##Ţ~qdqR>R&hK1h @CJH*\aJmH sH #hK1h @CJ\aJmH sH hK1hPCJmH sH hK1h CJmH sH hK1h,mHsHhK1hHmHsH hK1h=4 hK1hY]"!hK1h10JB*CJaJph#jEhK1h1CJUaJjhK1h1CJUaJhK1hZCJaJhK1hCJaJhK1h1CJaJhK1hg*CJaJ#v####$$$ $B$C$D$E$L$%%%<"<H<<zm`m^`mOhK1h @aJmHsHUhK1hxoaJmHsHhK1h aJmHsH hK1h. hK1h hK1h >*CJmH sH hK1hU^>*CJmH sH hK1hLtI>*CJmH sH hK1hn>*CJmH sH hK1hnCJmH sH  hK1hn@CJaJmH sH  hK1h @CJaJmH sH  hK1h @CJaJmH sH es domaines les plus populaires. Les mthodes le plus souvent utilises sont la classification et la prdiction, la description du concept et de la classe et lanalyse de levolution. (&P . A!n"n#n$n%EDyK 4http://207.27.3.29/tech/articles/1000managetech.htmyK hhttp://207.27.3.29/tech/articles/1000managetech.htm-DyK .http://www.tgc.com/dsstar/98/0519/980519.htmlyK \http://www.tgc.com/dsstar/98/0519/980519.html^@^ Normal,Body text$7`7a$CJ_HmHsHtHl@l Naslov 1,do not use$ & F<@&5CJKHOJQJb@b Naslov 3,do not use.$ & F<@&OJQJ>A@> Zadani font odlomkaZiZ Obi na tablica :V 44 la .k. Bez popisa 6U@6 1 Hiperveza >*B*phlOl Title of the Paper$@&`a$5CJ$mH sH uO" ,Name of the author(s) (first name last name)$`a$ mH sH uJO2J Address$`a$6mH sH uNON Subtitle$ & Fx@&a$5mH sH uTOT Formula$ & Fxx^a$ mH sH uROR Table Titles  & Fx56mH sH uZOZ Figure Titles$ & Fxa$56mH sH u`O` References Title$x`a$5;mH sH uFOF References Text mH sH uTOT Rsum Title$`a$5;mH sH u@O@ Rsum Text6mH sH uHOH , subheadline1CJOJQJ^JaJo(;> "& "& ;>+S_`b G=000 00+0+0+ 00`0`0`0`0`0`0`p0` 00 0 0 0  000 000000 00 00 00S =M900WA@0@0@0@0@0Oy000輧)  #<<<<>Wjm+DQ ;'''''XXhogiog4jog@kog2log2mogD2nog2oog((3=155=9*urn:schemas-microsoft-com:office:smarttagsplace8*urn:schemas-microsoft-com:office:smarttagsCityB*urn:schemas-microsoft-com:office:smarttagscountry-region=*urn:schemas-microsoft-com:office:smarttags PlaceName=*urn:schemas-microsoft-com:office:smarttags PlaceType9*urn:schemas-microsoft-com:office:smarttagsState Xj(! zGNMMN:=X'7Epb(LM:=:::::::>+S (G==Mirjana Peji BachKsenija Dumi i& /!g5^v2);rdl^L@ƍxBjcvKs8^`sTable .s^`ss^`s.s^`s.s^`ss^`s  X ^ `X ......   ^ `....... 8^`8........ `^``.........0^`0o(.s8^`s Figure .Th^T`()!g5Bj& ^L@);B2)0}PR  1s'UY)` r!"Y]"Jt"v"<$N'j 'g*k+'--63=4e5YQ7289:}C4GIGLtIp}IDKoJK.PaPpQ{XZZGT\*]y_k/nxo|qvw'xDy_{(}t{T,,=S& Mb~+Yd_.*7"Nz/.1B}Pu_e =itnSsW fH* U^ lgK1ActJ{h=CbA@a!CF>y,~kY-UJ*R^V5VV(5=@H,a4;@@@"@H@@x@UnknownGz Times New Roman5Symbol3& z Arial"q;uf⊆BtF\5\5!24d-- 3QH)?.iC:\Documents and Settings\mpejic\My Documents\Arhiva\Radovi u tijeku\Berlin 2003\template_berlin_word.dot%Type here the Title of you paper hereMirjana Peji BachMirjana Peji Bach      Oh+'0h      &Type here the Title of you paper hereofypeMirjana Peji Bach irj[Type here the full address(es), telephone number(s) and email address(es) of the author(s)template_berlin_worddreMirjana Peji Bachd3rjMicrosoft Word 10.0@@W:@v @+ɼ\՜.+,D՜.+,| px   ( Z-Type here the appropiate topic category herevie-Type here the name of your organisation herevie5-A &Type here the Title of you paper here NaslovH(0| _PID_HLINKS_AdHocReviewCycleID_EmailSubject _AuthorEmail_AuthorEmailDisplayName_ReviewingToolsShownOnceA0 TL.http://www.tgc.com/dsstar/98/0519/980519.html?14http://207.27.3.29/tech/articles/1000managetech.htmRCFrancuski - Data Mining of Official Data Bases-Pejic-Bach, Dumicickdumicic@efzg.hrMinKsenija Dumiisen !"#$%&')*+,-./012345679:;<=>?ABCDEFGJRoot Entry FɼLData  1Table(WordDocument*>SummaryInformation(8DocumentSummaryInformation8@CompObjk  FDokument Microsoft Worda MSWordDocWord.Document.89q