Pregled bibliografske jedinice broj: 76125
Ziv-Lepel compression of huge natural language data tries using suffix arrays
Ziv-Lepel compression of huge natural language data tries using suffix arrays // Journal of Discrete Algorithms, 1 (2000), 1; 241-256 (podatak o recenziji nije dostupan, članak, znanstveni)
CROSBI ID: 76125 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Ziv-Lepel compression of huge natural language data tries using suffix arrays
Autori
Ristov, Strahil ; Laporte, Eric
Izvornik
Journal of Discrete Algorithms (1468-0904) 1
(2000), 1;
241-256
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Sažetak
We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical information associated with each word. The compressed structure can be produced in O(N) time using suffix trees for finding repetitions in trie, but for large data sets space requirements are more prohibitive than time so suffix arrays are used instead, with compression time complexity O(N log N) for all but for the largest data sets.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika
POVEZANOST RADA