Formalizing the Recognition of Medical Domain Multiword Units (CROSBI ID 73731)
Prilog u knjizi | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Kocijan, Kristina ; Šojat, Krešimir
engleski
Formalizing the Recognition of Medical Domain Multiword Units
The chapter deals with the recognition of medical domain multiword units (MWU) in texts written in Croatian language. The focus is on the automatic recognition of complex MWUs using low resource settings. These units are complex in terms that they consist of two or more noun or prepositional phrases, and include three different models such as ‘symptomatic treatment of patients’ [simptomatsko(A) liječenje(N NOM) bolesnika(N GEN)] or ‘herbal anti-asthmatic syrup’ [biljni(A) sirup(N NOM) protiv(PREP) kašlja(N GEN), as well as more complex ones, such as ‘continuous evaluation of the risk-benefit balance of the drug’ [kontinuirano(A) praćenje(NOUN) omjera(N GEN) koristi(N GEN) i(C) rizika(N GEN) lijeka(N GEN)]. Our method for the detection of MWUs is based on morpho-syntactic rules in the form of finite-state transducers that are used at the syntactic level of analysis. The algorithms we propose are designed within the NooJ platform with the main objective of automatic building of a medical lexicon extracted directly from a medical domain corpus. Such a digital lexicon will be valuable for further processing of medical texts and various NLP tasks in this domain including enhanced clinical analytics, text mining, and machine translation, at later stages of the project.
multiword units, medical domain, low resource settings, digital medical lexicon, Croatian language, finite-state grammars, NooJ
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
89-120.
objavljeno
10.1201/9781003138013-5
Podaci o knjizi
Natural Language Processing in Healthcare: A Special Focus on Low Resource Languages
Dash, Satya Ranjan ; Parida, Shantipriya ; Tello, Esaú Villatoro ; Acharya, Biswaranjan ; Bojar, Ondřej
Boca Raton (FL): CRC Press
2022.
9781003138013
Povezanost rada
Filologija, Informacijske i komunikacijske znanosti, Interdisciplinarne društvene znanosti, Interdisciplinarne humanističke znanosti