Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1191779

A Corpus-Based Sentence Classifier for Entity– Relationship Modelling


Šuman, Sabrina; Čandrlić, Sanja; Jakupović, Alen
A Corpus-Based Sentence Classifier for Entity– Relationship Modelling // Electronics, 11 (2022), 6; 1-22 doi:10.3390/electronics11060889 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1191779 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A Corpus-Based Sentence Classifier for Entity– Relationship Modelling

Autori
Šuman, Sabrina ; Čandrlić, Sanja ; Jakupović, Alen

Izvornik
Electronics (2079-9292) 11 (2022), 6; 1-22

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
text mining ; data processing ; ER data modelling ; pattern recognition ; classification ; formal language ; controlled natural language ; linguistic corpus

Sažetak
Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.

Izvorni jezik
Engleski

Znanstvena područja
Interdisciplinarne tehničke znanosti, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
NadSve-Sveučilište u Rijeci-uniri-drustv-18-140 - Sustav temeljen na znanju kao potpora učenju učenika s disleksijom (Čandrlić, Sanja, NadSve - Natječaj za dodjelu sredstava potpore znanstvenim istraživanjima na Sveučilištu u Rijeci za 2018. godinu - projekti iskusnih znanstvenika i umjetnika) ( CroRIS)

Profili:

Avatar Url Sabrina Šuman (autor)

Avatar Url Alen Jakupović (autor)

Avatar Url Sanja Čandrlić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada doi www.mdpi.com

Citiraj ovu publikaciju:

Šuman, Sabrina; Čandrlić, Sanja; Jakupović, Alen
A Corpus-Based Sentence Classifier for Entity– Relationship Modelling // Electronics, 11 (2022), 6; 1-22 doi:10.3390/electronics11060889 (međunarodna recenzija, članak, znanstveni)
Šuman, S., Čandrlić, S. & Jakupović, A. (2022) A Corpus-Based Sentence Classifier for Entity– Relationship Modelling. Electronics, 11 (6), 1-22 doi:10.3390/electronics11060889.
@article{article, author = {\v{S}uman, Sabrina and \v{C}andrli\'{c}, Sanja and Jakupovi\'{c}, Alen}, year = {2022}, pages = {1-22}, DOI = {10.3390/electronics11060889}, keywords = {text mining, data processing, ER data modelling, pattern recognition, classification, formal language, controlled natural language, linguistic corpus}, journal = {Electronics}, doi = {10.3390/electronics11060889}, volume = {11}, number = {6}, issn = {2079-9292}, title = {A Corpus-Based Sentence Classifier for Entity– Relationship Modelling}, keyword = {text mining, data processing, ER data modelling, pattern recognition, classification, formal language, controlled natural language, linguistic corpus} }
@article{article, author = {\v{S}uman, Sabrina and \v{C}andrli\'{c}, Sanja and Jakupovi\'{c}, Alen}, year = {2022}, pages = {1-22}, DOI = {10.3390/electronics11060889}, keywords = {text mining, data processing, ER data modelling, pattern recognition, classification, formal language, controlled natural language, linguistic corpus}, journal = {Electronics}, doi = {10.3390/electronics11060889}, volume = {11}, number = {6}, issn = {2079-9292}, title = {A Corpus-Based Sentence Classifier for Entity– Relationship Modelling}, keyword = {text mining, data processing, ER data modelling, pattern recognition, classification, formal language, controlled natural language, linguistic corpus} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • Social Science Citation Index (SSCI)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font