Pregled bibliografske jedinice broj: 1039206
25 godina Hašeka
25 godina Hašeka // Jezik : časopis za kulturu hrvatskoga književnog jezika, 66 (2019), 4-5; 138-150 (recenziran, članak, znanstveni)
CROSBI ID: 1039206 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
25 godina Hašeka
(25years of Hašek)
Autori
Dembitz, Šandor
Izvornik
Jezik : časopis za kulturu hrvatskoga književnog jezika (0021-6925) 66
(2019), 4-5;
138-150
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
Hašek, strojna provjera teksta, učenje, Google, n-gramski sustavi
(Hašek, spellchecking, learning, Google, n-gram systems)
Sažetak
Hašek is a Croatian on-line spellchecker that continuously operates since March 21, 1994, nowadays at the address https://ispravi.me/. In 25 years of functioning Hašek processed nearly 30 million texts, which build a corpus of more than 7 billion tokens. By compari-son, all books ever published in Croatian form a corpus with less than 20 billion tokens. As a WWW-embedded tool, Hašek took advantage of many web-based services including learning. Thanks to Hašek’s learning capability, its dictionary increased from initial 100 thousand to more than 2 million word-types. Another aspect of learning was the creating and regular updating of the Croatian n-gram system. Unlike Google, whose n-gram systems are based on the WaC (Web as Corpus) approach and cut-off criteria, Croatian n-grams were extracted from processed texts by a lexical criterion: each n-gram constituent must be proven by the spellchecker as valid in Croatian spelling. The difference in approaches made Croatian n-gram system comparable in size to the largest Google n-gram systems. Unfortunately, the advantages of on-line spellchecking for rapid breakthroughs into much more sophisticated language technology areas were not recognized by Croatian decision makers, with some consequences mentioned in the paper.
Izvorni jezik
Hrvatski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Profili:
Šandor Dembitz
(autor)