Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

A Corpus-Based Sentence Classifier for Entity– Relationship Modelling (CROSBI ID 308922)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Šuman, Sabrina ; Čandrlić, Sanja ; Jakupović, Alen A Corpus-Based Sentence Classifier for Entity– Relationship Modelling // Electronics (Basel), 11 (2022), 6; 1-22. doi: 10.3390/electronics11060889

Podaci o odgovornosti

Šuman, Sabrina ; Čandrlić, Sanja ; Jakupović, Alen

engleski

A Corpus-Based Sentence Classifier for Entity– Relationship Modelling

Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.

text mining ; data processing ; ER data modelling ; pattern recognition ; classification ; formal language ; controlled natural language ; linguistic corpus

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

11 (6)

2022.

1-22

objavljeno

2079-9292

10.3390/electronics11060889

Trošak objave rada u otvorenom pristupu

Povezanost rada

Informacijske i komunikacijske znanosti, Interdisciplinarne tehničke znanosti

Poveznice
Indeksiranost