Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1039206

25 godina Hašeka


Dembitz, Šandor
25 godina Hašeka // Jezik : časopis za kulturu hrvatskoga književnog jezika, 66 (2019), 4-5; 138-150 (recenziran, članak, znanstveni)


CROSBI ID: 1039206 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
25 godina Hašeka
(25years of Hašek)

Autori
Dembitz, Šandor

Izvornik
Jezik : časopis za kulturu hrvatskoga književnog jezika (0021-6925) 66 (2019), 4-5; 138-150

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Hašek, strojna provjera teksta, učenje, Google, n-gramski sustavi
(Hašek, spellchecking, learning, Google, n-gram systems)

Sažetak
Hašek is a Croatian on-line spellchecker that continuously operates since March 21, 1994, nowadays at the address https://ispravi.me/. In 25 years of functioning Hašek processed nearly 30 million texts, which build a corpus of more than 7 billion tokens. By compari-son, all books ever published in Croatian form a corpus with less than 20 billion tokens. As a WWW-embedded tool, Hašek took advantage of many web-based services including learning. Thanks to Hašek’s learning capability, its dictionary increased from initial 100 thousand to more than 2 million word-types. Another aspect of learning was the creating and regular updating of the Croatian n-gram system. Unlike Google, whose n-gram systems are based on the WaC (Web as Corpus) approach and cut-off criteria, Croatian n-grams were extracted from processed texts by a lexical criterion: each n-gram constituent must be proven by the spellchecker as valid in Croatian spelling. The difference in approaches made Croatian n-gram system comparable in size to the largest Google n-gram systems. Unfortunately, the advantages of on-line spellchecking for rapid breakthroughs into much more sophisticated language technology areas were not recognized by Croatian decision makers, with some consequences mentioned in the paper.

Izvorni jezik
Hrvatski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Šandor Dembitz (autor)


Citiraj ovu publikaciju:

Dembitz, Šandor
25 godina Hašeka // Jezik : časopis za kulturu hrvatskoga književnog jezika, 66 (2019), 4-5; 138-150 (recenziran, članak, znanstveni)
Dembitz, Š. (2019) 25 godina Hašeka. Jezik : časopis za kulturu hrvatskoga književnog jezika, 66 (4-5), 138-150.
@article{article, author = {Dembitz, \v{S}andor}, year = {2019}, pages = {138-150}, keywords = {Ha\v{s}ek, strojna provjera teksta, u\v{c}enje, Google, n-gramski sustavi}, journal = {Jezik : \v{c}asopis za kulturu hrvatskoga knji\v{z}evnog jezika}, volume = {66}, number = {4-5}, issn = {0021-6925}, title = {25 godina Ha\v{s}eka}, keyword = {Ha\v{s}ek, strojna provjera teksta, u\v{c}enje, Google, n-gramski sustavi} }
@article{article, author = {Dembitz, \v{S}andor}, year = {2019}, pages = {138-150}, keywords = {Ha\v{s}ek, spellchecking, learning, Google, n-gram systems}, journal = {Jezik : \v{c}asopis za kulturu hrvatskoga knji\v{z}evnog jezika}, volume = {66}, number = {4-5}, issn = {0021-6925}, title = {25years of Ha\v{s}ek}, keyword = {Ha\v{s}ek, spellchecking, learning, Google, n-gram systems} }




Contrast
Increase Font
Decrease Font
Dyslexic Font