Pregled bibliografske jedinice broj: 460728
Compressing Gazetteers Revisited
Compressing Gazetteers Revisited // Pre-proceedings of the Eighth International Workshop on Finite-State Methods and Natural Language Processing 2009 workshop / Watson, Bruce ; Kourie, Derrick ; Cleophas, Loek ; Rautenbach, Pierre (ur.).
Pretoria: University of Pretoria, 2009. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 460728 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Compressing Gazetteers Revisited
Autori
Budišćak, Ivan ; Piskorski, Jakub ; Ristov, Strahil
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Pre-proceedings of the Eighth International Workshop on Finite-State Methods and Natural Language Processing 2009 workshop
/ Watson, Bruce ; Kourie, Derrick ; Cleophas, Loek ; Rautenbach, Pierre - Pretoria : University of Pretoria, 2009
ISBN
978-1-86854-743-2
Skup
Eighth International Workshop on Finite-State Methods and Natural Language Processing
Mjesto i datum
Pretoria, Južnoafrička Republika, 21.07.2009. - 24.07.2009
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Recursive Finite State Automata; Automata Compression; Gazetteer Compression
Sažetak
Finite-state automata are state-of-the-art representation of gazetteers in NLP. This paper compares different methods for gazetteer compression based on two, independently published, algorithms for automata substructure recognition. The more recent algorithm, that we denote REC-FSA (Recursive Finite State Automaton) has been invented specially for gazetteer compression and reported as the most space efficient approach at the time of publication. In this paper we apply the older method, denoted here with REC-FSA-2 and obtain circa 30% improvement of the compression rate compared to the more recent algorithm. However, the latter algorithm is much faster. We employ previously published modification of REC- FSA-2, that we denote REC-FSA-2-DICT, to achieve a viable compromise between the compression efficiency and time complexity. The results reported here represent the state-of-the-art in gazetteer compression.
Izvorni jezik
Engleski
Znanstvena područja
Matematika, Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
098-0982560-2566 - Mjerenje i karakterizacija podataka iz stvarnog svijeta (Medved-Rogina, Branka, MZOS ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Institut "Ruđer Bošković", Zagreb