Pregled bibliografske jedinice broj: 1002799
An overview and comparison of free Python libraries for data mining and big data analysis
An overview and comparison of free Python libraries for data mining and big data analysis // MIPRO 2019 Proceedings / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2019. str. 1161-1166 doi:10.23919/MIPRO.2019.8757088 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1002799 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
An overview and comparison of free Python
libraries for data mining and big data analysis
Autori
Stančin, Igor ; Jović, Alan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
MIPRO 2019 Proceedings
/ Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2019, 1161-1166
Skup
42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2019)
Mjesto i datum
Opatija, Hrvatska, 20.05.2019. - 24.05.2019
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
data science ; python ; data mining ; machine learning library ; big data analysis ; framework
Sažetak
The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for usage. The aim of this review paper is to describe and compare the characteristics of different data mining and big data analysis libraries in Python. There is currently no paper dealing with the subject and describing pros and cons of all these libraries. Here we consider more than 20 libraries and separate them into six groups: core libraries, data preparation, data visualization, machine learning, deep learning and big data. Beside functionalities of a certain library, important factors for comparison are the number of contributors developing and maintaining the library and the size of the community. Bigger communities mean larger chances for easily finding solution to a certain problem. We currently recommend: pandas for data preparation ; Matplotlib, seaborn or Plotly for data visualization ; scikit-learn for machine learning ; TensorFlow, Keras and PyTorch for deep learning ; and Hadoop Streaming and PySpark for big data.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb