Pregled bibliografske jedinice broj: 1191779
A Corpus-Based Sentence Classifier for Entity– Relationship Modelling
A Corpus-Based Sentence Classifier for Entity– Relationship Modelling // Electronics, 11 (2022), 6; 1-22 doi:10.3390/electronics11060889 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 1191779 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A Corpus-Based Sentence Classifier for Entity–
Relationship Modelling
Autori
Šuman, Sabrina ; Čandrlić, Sanja ; Jakupović, Alen
Izvornik
Electronics (2079-9292) 11
(2022), 6;
1-22
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
text mining ; data processing ; ER data modelling ; pattern recognition ; classification ; formal language ; controlled natural language ; linguistic corpus
Sažetak
Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.
Izvorni jezik
Engleski
Znanstvena područja
Interdisciplinarne tehničke znanosti, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
NadSve-Sveučilište u Rijeci-uniri-drustv-18-140 - Sustav temeljen na znanju kao potpora učenju učenika s disleksijom (Čandrlić, Sanja, NadSve - Natječaj za dodjelu sredstava potpore znanstvenim istraživanjima na Sveučilištu u Rijeci za 2018. godinu - projekti iskusnih znanstvenika i umjetnika) ( CroRIS)
Citiraj ovu publikaciju:
Časopis indeksira:
- Current Contents Connect (CCC)
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- Social Science Citation Index (SSCI)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus