A Survey of Word Embedding Algorithms for Textual Data Information Extraction

Vušak, Eugen; Kužina, Vjeko; Jović, Alan

Pregled bibliografske jedinice broj: 1148394

A Survey of Word Embedding Algorithms for Textual Data Information Extraction

Vušak, Eugen; Kužina, Vjeko; Jović, Alan

A Survey of Word Embedding Algorithms for Textual Data Information Extraction // MIPRO 2021 Proceedings / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021. str. 207-212 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 1148394 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A Survey of Word Embedding Algorithms for Textual Data Information Extraction

Autori
Vušak, Eugen ; Kužina, Vjeko ; Jović, Alan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
MIPRO 2021 Proceedings / Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021, 207-212

Skup
44th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2021)

Mjesto i datum
Opatija, Hrvatska, 27.09.2021. - 01.10.2021

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
word embedding ; textual data ; natural language processing ; word space ; text mining

Sažetak
Unlike other popular data types, such as images, textual data cannot be easily converted into a numerical form that machine learning algorithms can process. Therefore, text must be embedded into a vector space using embedding algorithms. These algorithms attempt to encapsulate as much information as possible from the text into a resulting vector space. Natural language is complex and contains numerous layers of information. Information can be obtained from a sequence of characters or subword units that make up the word. It can also be derived from the context in which a word occurs. For this reason, a variety of word embedding algorithms have been developed over time, which use different pieces of information in different ways. In this paper, the currently available word embedding algorithms are described and it is shown what kind of information these algorithms use. After analyzing these algorithms, we discuss how it can be advantageous to use combinations of different types of information in different research and application areas.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Vjeko Kužina (autor)