Cross-column chromatographic retention time prediction in proteomics: a machine learning approach

Žuvela, Petar; Lovrić, Mario; Lučić, Bono; Liu, Jay; Kern, Roman; Baczek, Tomasz

Pregled bibliografske jedinice broj: 1141486

Cross-column chromatographic retention time prediction in proteomics: a machine learning approach

Žuvela, Petar; Lovrić, Mario; Lučić, Bono; Liu, Jay; Kern, Roman; Baczek, Tomasz

Cross-column chromatographic retention time prediction in proteomics: a machine learning approach // HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques
Kyoto, Japan, 2019. AB00002, 1 doi:10.13140/RG.2.2.32898.02248 (poster, nije recenziran, prošireni sažetak, ostalo)

CROSBI ID: 1141486 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Cross-column chromatographic retention time prediction in proteomics: a machine learning approach

Autori
Žuvela, Petar ; Lovrić, Mario ; Lučić, Bono ; Liu, Jay ; Kern, Roman ; Baczek, Tomasz

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, prošireni sažetak, ostalo

Izvornik
HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques / - , 2019

Skup
HPLC2019 Kyoto - 49th International Symposium on High Performance Liquid Phase Separations and Related Techniques

Mjesto i datum
Kyoto, Japan, 01.12.2019. - 05.12.2019

Vrsta sudjelovanja
Poster

Vrsta recenzije
Nije recenziran

Ključne riječi
chromatography ; machine learning ; qsrr

Sažetak
Quantitative structure-retention relationships (QSRR) although widespread for prediction of retention time in reversed-phase liquid chromatography (RP-LC) suffer from the same limitation. Typically they are built for a specific set of chromatographic conditions (e.g., stationary phase, mobile phase composition, pH, temperature, total gradient time). To overcome this limitation, in this work we aimed to build global QSRR models for prediction of retention time of synthetic peptides across six RP-LC columns with varied experimental conditions. In this work, QSRR models were based on three a priori selected molecular descriptors: sum of gradient retention times of 20 natural amino acids (logSumAA), van der Waals volume (logvdWvol.), and hydrophobicity (clogP) related to the retention mechanism of RP-LC separation of peptides. A multitude of machine learning methods was compared: random forests (RF), adaptive boosting (ADA), and gaussian process regression (GPR). The models were comprehensively optimized through 3-fold cross- validation (CV) and validated through an external validation set. Chemical domain of applicability was also defined, while statistical significance of the models was tested using CV-ANOVA. All the models were also compared to the conventional linear model built using partial least squares (PLS). Results have shown that all the machine learning methods outperformed PLS with %RMSEP ranging from 14.99 % ; for RF, to 26.35 % for ADA. On the other hand, PLS exhibited a %RMSEP of 40.56 %. The novel ensemble and mixture models revealed mechanisms behind black-box global QSRR models and paved the way to resolving the principal limitation of QSRR modelling. The models have shown the highest feature importance for sum of gradient retention times (logSumAA), followed by van der Waals volume (logvdWvol.), and hydrophobicity (clogP). The promising results of this study show the potential of machine learning for improved peptide identification, retention time standardization and integration into state-of-the-art LC-MS/MS proteomics workflows.

Izvorni jezik
Engleski

Znanstvena područja
Kemija, Interdisciplinarne prirodne znanosti, Računarstvo, Interdisciplinarne tehničke znanosti

POVEZANOST RADA

Ustanove:
Institut "Ruđer Bošković", Zagreb

Profili:

Bono Lučić (autor)