Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 583750

Croatian Language N-Gram System


Dembitz, Šandor; Blašković, Bruno; Gledec, Gordan
Croatian Language N-Gram System // Frontiers in artificial intelligence and applications, 243 (2012), 696-705 doi:10.3233/978-1-61499-105-2-696 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 583750 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Croatian Language N-Gram System

Autori
Dembitz, Šandor ; Blašković, Bruno ; Gledec, Gordan

Izvornik
Frontiers in artificial intelligence and applications (0922-6389) 243 (2012); 696-705

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Croatian; lexical n-gram; language modeling; Heaps’ law

Sažetak
Large-scale n-gram models are available for a small number of languages. So far, Croatian was not one of them. The research presented in this paper describes the development of n-gram database system suitable for large-scale language modeling in Croatian. The process of n-gram collection relies on Croatian academic online spellchecker Hascheck, which has been publicly available since 1993, and is today a popular language service, with average daily traffic exceeding million tokens. The approach demonstrated in this paper eliminated the need of n-gram data cleaning in the post-processing phase, which is a serious issue in other languages. The spellchecker dynamics allowed Heaps’ law modeling to be applied to Croatian n-grams, which enabled the prediction of n-gram count growth.

Izvorni jezik
Engleski

Znanstvena područja
Elektrotehnika, Računarstvo



POVEZANOST RADA


Projekti:
036-0361983-2019 - Računalna potpora obrazovanju (Mornar, Vedran, MZO ) ( CroRIS)
036-0362027-1638 - Umrežena ekonomija (Skočir, Zoran, MZO ) ( CroRIS)
036-0362027-1639 - Isporuka sadržaja i pokretljivost korisnika i usluga u mrežama nove generacije (Matijašević, Maja, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Gordan Gledec (autor)

Avatar Url Šandor Dembitz (autor)

Avatar Url Bruno Blašković (autor)

Poveznice na cjeloviti tekst rada:

doi booksonline.iospress.nl

Citiraj ovu publikaciju:

Dembitz, Šandor; Blašković, Bruno; Gledec, Gordan
Croatian Language N-Gram System // Frontiers in artificial intelligence and applications, 243 (2012), 696-705 doi:10.3233/978-1-61499-105-2-696 (međunarodna recenzija, članak, znanstveni)
Dembitz, Š., Blašković, B. & Gledec, G. (2012) Croatian Language N-Gram System. Frontiers in artificial intelligence and applications, 243, 696-705 doi:10.3233/978-1-61499-105-2-696.
@article{article, author = {Dembitz, \v{S}andor and Bla\v{s}kovi\'{c}, Bruno and Gledec, Gordan}, year = {2012}, pages = {696-705}, DOI = {10.3233/978-1-61499-105-2-696}, keywords = {Croatian, lexical n-gram, language modeling, Heaps’ law}, journal = {Frontiers in artificial intelligence and applications}, doi = {10.3233/978-1-61499-105-2-696}, volume = {243}, issn = {0922-6389}, title = {Croatian Language N-Gram System}, keyword = {Croatian, lexical n-gram, language modeling, Heaps’ law} }
@article{article, author = {Dembitz, \v{S}andor and Bla\v{s}kovi\'{c}, Bruno and Gledec, Gordan}, year = {2012}, pages = {696-705}, DOI = {10.3233/978-1-61499-105-2-696}, keywords = {Croatian, lexical n-gram, language modeling, Heaps’ law}, journal = {Frontiers in artificial intelligence and applications}, doi = {10.3233/978-1-61499-105-2-696}, volume = {243}, issn = {0922-6389}, title = {Croatian Language N-Gram System}, keyword = {Croatian, lexical n-gram, language modeling, Heaps’ law} }

Časopis indeksira:


  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • ACM Digital Library
  • DBLP
  • Google Scholar
  • SciVerse Scopus
  • Zentralblatt MATH


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font