Napredna pretraga

Pregled bibliografske jedinice broj: 932224

The language networks

Martinčić-Ipšić, Sanda; Meštrović, Ana
The language networks, Rijeka: University of Rijeka, 2018 (monografija)

The language networks
(Martinčić-Ipšić, Sanda)

Martinčić-Ipšić, Sanda ; Meštrović, Ana

Vrsta, podvrsta i kategorija knjige
Autorske knjige, monografija, znanstvena

University of Rijeka





Ključne riječi
Language networks, NLP, complex networks

The language networks book provides insights into the principles of modeling and analyzing structural properties of language – manly in its written form, hence text. Book guidelines the basic principles of text preprocessing, covering the very initial steps needed for any natural language processing task. Further, the book examines the possibilities of representing text in a complex networks framework. The second part overviews the application of language networks as one of a data science disciplines. It covers important data science topics for the processing of the big textual data from extracting the most salient structural parts of documents, across differentiation between text genres to predicting the speeding of the information through social media. Finally, the last part of the book is tasked with formal modeling of the linguistic subsystems in a multilayer complex networks formalism, which allows systematic study of language across all of its subsystems. The first part of the book studies the general principles of language networks construction and analysis. It covers language network construction types. Specifically, it analyzes the effects of constructing directed vs. undirected, weighted vs. unweighted network from lemmatized (stemmed) or non-lemmatized texts with stopwords included or excluded. The effects of text randomization are studied enabling better insights into characteristics of language networks compared to their shuffled counterparts. Some preliminary experiments reveal the possibilities of the differentiation of the structural properties of networks constructed from different text types and in different languages like Croatian, English, and Italian. Next, some initial insights into the characterization of syllabic networks are presented. The analysis of motifs of the linguistic networks reveals the typical building blocks of the structure of networks of the literature in the Croatian language. Finally, the first part of the book concludes with the LaNCoA a Python Toolkit for the construction and analysis of language networks implementing the majority of the findings presented in this part of the book. The second part of the book is dedicated to the applications of language networks. The language networks enable the extraction of the most salient words in texts – keywords and extraction of the domain knowledge-context studied on the content of Wikipedia entries. The applicative part of language networks includes the differentiation between different text types and polarization of tweets, as well. Finally, the possibilities of predicting the future content of tweets solely from the structural properties of the complex language networks are presented. The third part of the book presents the formal model of language networks. Multilayered language network represents a comprehensive framework based on the multilayered graphs that can model various aspects of language like subsystems at the different level in the hierarchy, the construction principles, the language types and others. Multilayer language model serves as a unified formal model for the representation of language within the complex networks theory.

Izvorni jezik


Sveučilište u Rijeci - Odjel za informatiku