Pregled bibliografske jedinice broj: 1141486
Cross-column chromatographic retention time prediction in proteomics: a machine learning approach
Cross-column chromatographic retention time prediction in proteomics: a machine learning approach // HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques
Kyoto, Japan, 2019. AB00002, 1 doi:10.13140/RG.2.2.32898.02248 (poster, nije recenziran, prošireni sažetak, ostalo)
CROSBI ID: 1141486 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Cross-column chromatographic retention time
prediction in proteomics: a machine learning
approach
Autori
Žuvela, Petar ; Lovrić, Mario ; Lučić, Bono ; Liu, Jay ; Kern, Roman ; Baczek, Tomasz
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, prošireni sažetak, ostalo
Izvornik
HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques
/ - , 2019
Skup
HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques
Mjesto i datum
Kyoto, Japan, 01.12.2019. - 05.12.2019
Vrsta sudjelovanja
Poster
Vrsta recenzije
Nije recenziran
Ključne riječi
chromatography ; machine learning ; qsrr
Sažetak
Quantitative structure-retention relationships (QSRR) although widespread for prediction of retention time in reversed-phase liquid chromatography (RP-LC) suffer from the same limitation. Typically they are built for a specific set of chromatographic conditions (e.g., stationary phase, mobile phase composition, pH, temperature, total gradient time). To overcome this limitation, in this work we aimed to build global QSRR models for prediction of retention time of synthetic peptides across six RP-LC columns with varied experimental conditions. In this work, QSRR models were based on three a priori selected molecular descriptors: sum of gradient retention times of 20 natural amino acids (logSumAA), van der Waals volume (logvdWvol.), and hydrophobicity (clogP) related to the retention mechanism of RP-LC separation of peptides. A multitude of machine learning methods was compared: random forests (RF), adaptive boosting (ADA), and gaussian process regression (GPR). The models were comprehensively optimized through 3-fold cross- validation (CV) and validated through an external validation set. Chemical domain of applicability was also defined, while statistical significance of the models was tested using CV-ANOVA. All the models were also compared to the conventional linear model built using partial least squares (PLS). Results have shown that all the machine learning methods outperformed PLS with %RMSEP ranging from 14.99 % ; for RF, to 26.35 % for ADA. On the other hand, PLS exhibited a %RMSEP of 40.56 %. The novel ensemble and mixture models revealed mechanisms behind black-box global QSRR models and paved the way to resolving the principal limitation of QSRR modelling. The models have shown the highest feature importance for sum of gradient retention times (logSumAA), followed by van der Waals volume (logvdWvol.), and hydrophobicity (clogP). The promising results of this study show the potential of machine learning for improved peptide identification, retention time standardization and integration into state-of-the-art LC-MS/MS proteomics workflows.
Izvorni jezik
Engleski
Znanstvena područja
Kemija, Interdisciplinarne prirodne znanosti, Računarstvo, Interdisciplinarne tehničke znanosti
POVEZANOST RADA
Ustanove:
Institut "Ruđer Bošković", Zagreb