An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model

Orešković, Marko

Pregled bibliografske jedinice broj: 997715

An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model

Orešković, Marko

An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model, 2019., doktorska disertacija, Fakultet organizacije i informatike, Zagreb doi:10.13140/RG.2.2.31092.19849

CROSBI ID: 997715 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model

Autori
Orešković, Marko

Vrsta, podvrsta i kategorija rada
Ocjenski radovi, doktorska disertacija

Fakultet
Fakultet organizacije i informatike

Mjesto
Zagreb

Datum
15.03

Godina
2019

Stranica
237

Mentor
Čubrilo, Mirko ; Essert, Mario

Ključne riječi
syntax analysis, semantic analysis, lexical relations extraction, new lexicon types, hierarchical tagset structure, linked open data

Sažetak
Given the extraordinary growth in online documents, methods for automated extraction of semantic relations became popular, and shortly after, became necessary. This thesis proposes a new deterministic language model, with the associated artifact, which acts as an online Syntactic and Semantic Framework (SSF) for the extraction of morphosyntactic and semantic relations. The model covers all fundamental linguistic fields: Morphology (formation, composition, and word paradigms), Lexicography (storing words and their features in network lexicons), Syntax (the composition of words in meaningful parts: phrases, sentences, and pragmatics), and Semantics (determining the meaning of phrases). To achieve this, a new tagging system with more complex structures was developed. Instead of the commonly used vectored systems, this new tagging system uses tree-like T-structures with hierarchical, grammatical Word of Speech (WOS), and Semantic of Word (SOW) tags. For relations extraction, it was necessary to develop a syntactic (sub)model of language, which ultimately is the foundation for performing semantic analysis. This was achieved by introducing a new `O-structure', which represents the union of WOS/SOW features from T- structures of words and enables the creation of syntagmatic patterns. Such patterns are a powerful mechanism for the extraction of conceptual structures (e.g., metonymies, similes, or metaphors), breaking sentences into main and subordinate clauses, or detection of a sentence’s main construction parts (subject, predicate, and object). Since all program modules are developed as general and generative entities, SSF can be used for any of the Indo- European languages, although validation and network lexicons have been developed for the Croatian language only. The SSF has three types of lexicons (morphs/syllables, words, and multi- word expressions), and the main words lexicon is included in the Global Linguistic Linked Open Data (LLOD) Cloud, allowing interoperability with all other world languages. The SSF model and its artifact represent a complete natural language model which can be used to extract the lexical relations from single sentences, paragraphs, and also from large collections of documents.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti

POVEZANOST RADA

Ustanove:
Fakultet organizacije i informatike, Varaždin

Profili:

Marko Orešković (autor)