Formalizing the Recognition of Medical Domain Multiword Units

Kocijan, Kristina; Šojat, Krešimir

Pregled bibliografske jedinice broj: 1209197

Formalizing the Recognition of Medical Domain Multiword Units

Kocijan, Kristina; Šojat, Krešimir

Formalizing the Recognition of Medical Domain Multiword Units // Natural Language Processing in Healthcare: A Special Focus on Low Resource Languages / Dash, Satya Ranjan ; Parida, Shantipriya ; Tello, Esaú Villatoro ; Acharya, Biswaranjan ; Bojar, Ondřej (ur.).
Boca Raton (FL): CRC Press, 2022. str. 89-120 doi:10.1201/9781003138013-5

CROSBI ID: 1209197 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Formalizing the Recognition of Medical Domain Multiword Units

Autori
Kocijan, Kristina ; Šojat, Krešimir

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Natural Language Processing in Healthcare: A Special Focus on Low Resource Languages

Urednik/ci
Dash, Satya Ranjan ; Parida, Shantipriya ; Tello, Esaú Villatoro ; Acharya, Biswaranjan ; Bojar, Ondřej

Izdavač
CRC Press

Grad
Boca Raton (FL)

Godina
2022

Raspon stranica
89-120

ISBN
9781003138013

Ključne riječi
multiword units, medical domain, low resource settings, digital medical lexicon, Croatian language, finite-state grammars, NooJ

Sažetak
The chapter deals with the recognition of medical domain multiword units (MWU) in texts written in Croatian language. The focus is on the automatic recognition of complex MWUs using low resource settings. These units are complex in terms that they consist of two or more noun or prepositional phrases, and include three different models such as ‘symptomatic treatment of patients’ [simptomatsko(A) liječenje(N NOM) bolesnika(N GEN)] or ‘herbal anti-asthmatic syrup’ [biljni(A) sirup(N NOM) protiv(PREP) kašlja(N GEN), as well as more complex ones, such as ‘continuous evaluation of the risk-benefit balance of the drug’ [kontinuirano(A) praćenje(NOUN) omjera(N GEN) koristi(N GEN) i(C) rizika(N GEN) lijeka(N GEN)]. Our method for the detection of MWUs is based on morpho-syntactic rules in the form of finite-state transducers that are used at the syntactic level of analysis. The algorithms we propose are designed within the NooJ platform with the main objective of automatic building of a medical lexicon extracted directly from a medical domain corpus. Such a digital lexicon will be valuable for further processing of medical texts and various NLP tasks in this domain including enhanced clinical analytics, text mining, and machine translation, at later stages of the project.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Interdisciplinarne društvene znanosti, Filologija, Interdisciplinarne humanističke znanosti

POVEZANOST RADA

Projekti:
FFZG--11-931-1047 - Obrada prirodnog jezika u domeni zdravstva (Kocijan, Kristina, FFZG ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Krešimir Šojat (autor)