Automatic Annotation of Narrative Radiology Reports

Krsnik, Ivan; Glavaš, Goran; Krsnik, Marina; Miletić, Damir; Štajduhar, Ivan

izvor podataka: crosbi !

Automatic Annotation of Narrative Radiology Reports (CROSBI ID 276669)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Krsnik, Ivan ; Glavaš, Goran ; Krsnik, Marina ; Miletić, Damir ; Štajduhar, Ivan Automatic Annotation of Narrative Radiology Reports // Diagnostics, 10 (2020), 4; 196, 15. doi: 10.3390/diagnostics10040196

Podaci o odgovornosti

Autori

Krsnik, Ivan ; Glavaš, Goran ; Krsnik, Marina ; Miletić, Damir ; Štajduhar, Ivan

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Automatic Annotation of Narrative Radiology Reports

Sažetak

Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models—Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)—coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10- fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F1 score of 86.7% . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease (95.9%), arthrosis (93.3%), and injury (89.2%). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro- averaged F1 scores of 84.6%, 82.2% , and 82.1% , respectively.

Ključne riječi

free-form radiology report ; automatic labelling ; decision support system ; natural language processing ; machine learning ; word embedding ; knee

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Diagnostics

Volumen (broj)

10 (4)

Godina

2020.

Broj rada

196

Broj stranica

Status objave rada

objavljeno

e-ISSN

2075-4418

DOI

10.3390/diagnostics10040196

Trošak objave rada u otvorenom pristupu

APC

1100,00 CHF

Povezanost rada

Povezane osobe

Goran Glavaš (autor/i)

Damir Miletić (autor/i)

Ivan Štajduhar (autor/i)

Povezane ustanove

Medicinski fakultet u Rijeci (062) (autorova ustanova)

Tehnički fakultet, Rijeka (069) (autorova ustanova)

Veterinarski fakultet, Zagreb (053) (autorova ustanova)

Područje

Računarstvo, Kliničke medicinske znanosti

Poveznice

doi.org

mdpi.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, Emerging Sources Citation Index (WoSCC-ESCI)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)