Pregled bibliografske jedinice broj: 507924
Building a gold standard for event detection in Croatian
Building a gold standard for event detection in Croatian // Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Rosner, Mike ; Tapias, Daniel (ur.).
Valletta: European Language Resources Association (ELRA), 2010. str. 3101-3104 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 507924 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Building a gold standard for event detection in Croatian
Autori
Ljubešić, Nikola ; Boras, Damir ; Lauc, Tomislava
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
ISBN
2-9517408-6-7
Skup
Language Resources and Evaluation Conference
Mjesto i datum
Valletta, Malta, 17.05.2010. - 23.05.2010
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
event detection; gold standard; newspaper text; Croatian language
Sažetak
This paper describes the process of building a newspaper corpus annotated with events described in specific documents. The main differ- ence to the corpora built as part of the TDT initiative is that documents are not annotated by topics, but by specific events they describe. Additionally, documents are gathered from sixteen sources and all documents in the corpus are annotated with the corresponding event. The annotation process consists of a browsing and a searching step. Experiments are performed with a threshold that could be used in the browsing step yielding the result of having to browse through only 1% of document pairs for a 2% loss of relevant document pairs. A statistical analysis of the annotated corpus is undertaken showing that most events are described by few documents while just some events are reported by many documents. The inter- annotator agreement measures show high agreement concerning grouping documents into event clusters, but show a much lower agreement concerning the number of events the documents are organized into. An initial experiment is described giving a baseline for further research on this corpus.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)
130-1301799-1999 - Oblikovanje i upravljanje javnim znanjem u informacijskom prostoru (Tuđman, Miroslav, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb