Pregled bibliografske jedinice broj: 76128
A method for compressing lexicons, DCC02, Data Compression Conference
A method for compressing lexicons, DCC02, Data Compression Conference // DCC 2002 / Storer, James; Cohn, Martin (ur.).
Snowbird (UT), Sjedinjene Američke Države: IEEE, Computer Society, 2002. (poster, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 76128 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A method for compressing lexicons, DCC02, Data Compression Conference
Autori
Ristov, Strahil ; Laporte, Eric
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
DCC 2002
/ Storer, James; Cohn, Martin - : IEEE, Computer Society, 2002
Skup
Data Compression Conference
Mjesto i datum
Snowbird (UT), Sjedinjene Američke Države, 02.04.2002. - 04.04.2002
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
natural language lexicon; spelling-to-phonetic conversion; compressed trie; index compression
Sažetak
Natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on extant efficient method for compressing tries. Straightforward trie compression becomes ineffective when strings are long so words and associated data sets are compressed separately, additionally processed and linked with auxiliary index structure. The index file is compressed with canonical Huffman codes and, for the example of 660.000 entries, 18 Mbytes French phonetic lexicon, overall size of searchable compressed string set is 7% of the original size.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika