A Survey of Word Embedding Algorithms for Textual Data Information Extraction (CROSBI ID 707891)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Vušak, Eugen ; Kužina, Vjeko ; Jović, Alan
engleski
A Survey of Word Embedding Algorithms for Textual Data Information Extraction
Unlike other popular data types, such as images, textual data cannot be easily converted into a numerical form that machine learning algorithms can process. Therefore, text must be embedded into a vector space using embedding algorithms. These algorithms attempt to encapsulate as much information as possible from the text into a resulting vector space. Natural language is complex and contains numerous layers of information. Information can be obtained from a sequence of characters or subword units that make up the word. It can also be derived from the context in which a word occurs. For this reason, a variety of word embedding algorithms have been developed over time, which use different pieces of information in different ways. In this paper, the currently available word embedding algorithms are described and it is shown what kind of information these algorithms use. After analyzing these algorithms, we discuss how it can be advantageous to use combinations of different types of information in different research and application areas.
word embedding ; textual data ; natural language processing ; word space ; text mining
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
207-212.
2021.
objavljeno
Podaci o matičnoj publikaciji
MIPRO 2021 Proceedings
Skala, Karolj
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO
1847-3938
1847-3946
Podaci o skupu
MIPRO 2021
predavanje
27.09.2021-01.10.2021
Opatija, Hrvatska