Pregled bibliografske jedinice broj: 1148394
A Survey of Word Embedding Algorithms for Textual Data Information Extraction
A Survey of Word Embedding Algorithms for Textual Data Information Extraction // MIPRO 2021 Proceedings / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021. str. 207-212 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1148394 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A Survey of Word Embedding Algorithms for Textual
Data Information Extraction
Autori
Vušak, Eugen ; Kužina, Vjeko ; Jović, Alan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
MIPRO 2021 Proceedings
/ Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021, 207-212
Skup
44th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2021)
Mjesto i datum
Opatija, Hrvatska, 27.09.2021. - 01.10.2021
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
word embedding ; textual data ; natural language processing ; word space ; text mining
Sažetak
Unlike other popular data types, such as images, textual data cannot be easily converted into a numerical form that machine learning algorithms can process. Therefore, text must be embedded into a vector space using embedding algorithms. These algorithms attempt to encapsulate as much information as possible from the text into a resulting vector space. Natural language is complex and contains numerous layers of information. Information can be obtained from a sequence of characters or subword units that make up the word. It can also be derived from the context in which a word occurs. For this reason, a variety of word embedding algorithms have been developed over time, which use different pieces of information in different ways. In this paper, the currently available word embedding algorithms are described and it is shown what kind of information these algorithms use. After analyzing these algorithms, we discuss how it can be advantageous to use combinations of different types of information in different research and application areas.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb