Presentation + Paper
18 June 2024 A comprehensive pipeline to integrate preprocessing and machine learning techniques for accurate classification in Raman spectroscopy
Author Affiliations +
Abstract
Raman spectroscopy, a non-invasive analytical method, offers insights into molecular structures and interactions in various liquid and solid samples with applications ranging from material science, and chemical analysis to medical diagnostics. Preprocessing of Raman spectra is vital to remove interferences like background signals and calibration errors, ensuring precise data extraction. Artificial intelligence, particularly machine learning (ML), aids in extracting valuable information from complex datasets. However, effective data preprocessing proves to be crucial as it can influence model robustness. This study addresses the integration of preprocessing and ML algorithms, often treated as distinct identities despite their intrinsic interconnection, in Raman spectra of blood samples from patients suffering from ovarian cancer. Optimal preprocessing configuration may not always be evident due to the complexity of spectral data. There are numerous options available for background corrections, normalization, outlier removal, noise filtering, and dimension reduction algorithms for Raman spectra. Moreover, hyperparameter tuning is required to detect the best choices for the preprocessing steps. In this work, we present a pipeline to co-optimize preprocessing techniques and ML classification methods to promote objective selection and minimize processing time. In our approach, preprocessing methods are not chosen arbitrarily but rather systematically evaluated to enhance the robustness of the models. These criteria focus on ensuring that the model performs well not only on the training data but also on unseen data, thus reducing the risk of overfitting and improving the generalization capability of the model. This systematic approach would reduce the time for new studies by detecting the most suitable preprocessing steps and hyperparameters needed and building a robust model for the task.
Conference Presentation
© (2024) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Simone Innocente, Siddra Maryam, Stefan Andersson-Engels, Katarzyna Komolibus, Rekha Gautam, and Andrea Visentin "A comprehensive pipeline to integrate preprocessing and machine learning techniques for accurate classification in Raman spectroscopy", Proc. SPIE 13011, Data Science for Photonics and Biophotonics, 1301107 (18 June 2024); https://doi.org/10.1117/12.3017024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Raman spectroscopy

Data modeling

Cross validation

Machine learning

Ovarian cancer

Back to Top