Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 635469

Croatian Language N-Gram System


Šandor Dembitz, Bruno Blašković, Gordan Gledec
Croatian Language N-Gram System // Advances in Knowledge-Based and Intelligent Information and Engineering Systems / Graña, Manuel ; Toro, Carlos ; Posada, Jorge ; Howlett, Robert J. ; Jain, Lakhmi C. (ur.).
Amsterdam: IOS Press, 2012. str. 696-705


CROSBI ID: 635469 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Croatian Language N-Gram System

Autori
Šandor Dembitz, Bruno Blašković, Gordan Gledec

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Advances in Knowledge-Based and Intelligent Information and Engineering Systems

Urednik/ci
Graña, Manuel ; Toro, Carlos ; Posada, Jorge ; Howlett, Robert J. ; Jain, Lakhmi C.

Izdavač
IOS Press

Grad
Amsterdam

Godina
2012

Raspon stranica
696-705

ISBN
978-1-61499-104-5

Ključne riječi
Croatian, lexical n-gram, language modeling, Heaps’ law

Sažetak
Large-scale n-gram models are available for a small number of languages. So far, Croatian was not one of them. The research presented in this paper describes the development of n-gram database system suitable for large-scale language modeling in Croatian. The process of n-gram collection relies on Croatian academic online spellchecker Hascheck, which has been publicly available since 1993, and is today a popular language service, with average daily traffic exceeding million tokens. The approach demonstrated in this paper eliminated the need of n-gram data cleaning in the post-processing phase, which is a serious issue in other languages. The spellchecker dynamics allowed Heaps’ law modeling to be applied to Croatian n-grams, which enabled the prediction of n-gram count growth.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb


Citiraj ovu publikaciju:

Šandor Dembitz, Bruno Blašković, Gordan Gledec
Croatian Language N-Gram System // Advances in Knowledge-Based and Intelligent Information and Engineering Systems / Graña, Manuel ; Toro, Carlos ; Posada, Jorge ; Howlett, Robert J. ; Jain, Lakhmi C. (ur.).
Amsterdam: IOS Press, 2012. str. 696-705
Šandor Dembitz, Bruno Blašković, Gordan Gledec (2012) Croatian Language N-Gram System. U: Graña, M., Toro, C., Posada, J., Howlett, R. & Jain, L. (ur.) Advances in Knowledge-Based and Intelligent Information and Engineering Systems. Amsterdam, IOS Press, str. 696-705.
@inbook{inbook, year = {2012}, pages = {696-705}, keywords = {Croatian, lexical n-gram, language modeling, Heaps’ law}, isbn = {978-1-61499-104-5}, title = {Croatian Language N-Gram System}, keyword = {Croatian, lexical n-gram, language modeling, Heaps’ law}, publisher = {IOS Press}, publisherplace = {Amsterdam} }
@inbook{inbook, year = {2012}, pages = {696-705}, keywords = {Croatian, lexical n-gram, language modeling, Heaps’ law}, isbn = {978-1-61499-104-5}, title = {Croatian Language N-Gram System}, keyword = {Croatian, lexical n-gram, language modeling, Heaps’ law}, publisher = {IOS Press}, publisherplace = {Amsterdam} }




Contrast
Increase Font
Decrease Font
Dyslexic Font