Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1030343

Dynamic N-Gram System Based on an Online Croatian Spellchecking Service


Gledec, Gordan; Šoić, Renato; Dembitz, Šandor
Dynamic N-Gram System Based on an Online Croatian Spellchecking Service // IEEE Access, 7 (2019), 149988-149995 doi:10.1109/access.2019.2947898 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1030343 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Dynamic N-Gram System Based on an Online Croatian Spellchecking Service

Autori
Gledec, Gordan ; Šoić, Renato ; Dembitz, Šandor

Izvornik
IEEE Access (2169-3536) 7 (2019); 149988-149995

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Croatian language, Heaps’ law, language modeling, lexical n-gram, n-gram system comparison

Sažetak
As an infrastructure able to accelerate the development of natural language processing applications, large-scale lexical n-gram databases are at present important data systems. However, deriving such systems for world minority languages as it was done in the Google n-gram project leads to many obstacles. This paper presents an innovative approach to large-scale n-gram system creation applied to the Croatian language. Instead of using the Web as the world’s largest text repository, our process of n-gram collection relies on the Croatian online academic spellchecker Hascheck , a language service publicly available since 1993 and popular worldwide. Our n-gram filtering is based on dictionary criteria, contrary to the publicly available Google n-gram systems in which cutoff criteria were applied. After 12 years of collecting, the size of the Croatian n-gram system reached the size of the largest Google Version 1 n-gram systems. Due to reliance on a service in constant use, the Croatian n-gram system is a dynamic one. System dynamics allowed modeling of n-gram count behavior through Heaps’ law, which led to interesting results. Like many minority languages, the Croatian language suffers from a lack of sophisticated language processing systems in many application areas. The importance of a rich lexical n-gram infrastructure for rapid breakthroughs in new application areas is also exemplified in the paper.

Izvorni jezik
Engleski

Znanstvena područja
Elektrotehnika, Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Sveučilište u Zagrebu

Profili:

Avatar Url Renato Šoić (autor)

Avatar Url Šandor Dembitz (autor)

Avatar Url Gordan Gledec (autor)

Poveznice na cjeloviti tekst rada:

doi ieeexplore.ieee.org www.researchgate.net

Citiraj ovu publikaciju:

Gledec, Gordan; Šoić, Renato; Dembitz, Šandor
Dynamic N-Gram System Based on an Online Croatian Spellchecking Service // IEEE Access, 7 (2019), 149988-149995 doi:10.1109/access.2019.2947898 (međunarodna recenzija, članak, znanstveni)
Gledec, G., Šoić, R. & Dembitz, Š. (2019) Dynamic N-Gram System Based on an Online Croatian Spellchecking Service. IEEE Access, 7, 149988-149995 doi:10.1109/access.2019.2947898.
@article{article, author = {Gledec, Gordan and \v{S}oi\'{c}, Renato and Dembitz, \v{S}andor}, year = {2019}, pages = {149988-149995}, DOI = {10.1109/access.2019.2947898}, keywords = {Croatian language, Heaps’ law, language modeling, lexical n-gram, n-gram system comparison}, journal = {IEEE Access}, doi = {10.1109/access.2019.2947898}, volume = {7}, issn = {2169-3536}, title = {Dynamic N-Gram System Based on an Online Croatian Spellchecking Service}, keyword = {Croatian language, Heaps’ law, language modeling, lexical n-gram, n-gram system comparison} }
@article{article, author = {Gledec, Gordan and \v{S}oi\'{c}, Renato and Dembitz, \v{S}andor}, year = {2019}, pages = {149988-149995}, DOI = {10.1109/access.2019.2947898}, keywords = {Croatian language, Heaps’ law, language modeling, lexical n-gram, n-gram system comparison}, journal = {IEEE Access}, doi = {10.1109/access.2019.2947898}, volume = {7}, issn = {2169-3536}, title = {Dynamic N-Gram System Based on an Online Croatian Spellchecking Service}, keyword = {Croatian language, Heaps’ law, language modeling, lexical n-gram, n-gram system comparison} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • IET Inspec
  • Ei Compendex
  • EBSCOhost
  • Google Scholar
  • Directory of Open Access Journals (DOAJ)


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font