IMPLEMENTACIÓN DE TÉCNICAS DE RECONOCIMIENTO DE PATRONES (LEAST SQUARE SUPPORT VECTOR MACHINES) EN PROCESOS DE SELECCIÓN DE PARÁMETROS CARACTERÍSTICOS APLICADOS A SISTEMAS METABOLOMICOS

 

Autor: William Villamizar Rozo.

Director: MSc. Luis Enrique Mendoza.

 

MOTIVACIÓN


La metabolómica es el análisis global de todas o un gran número de sus metabolitos celulares [1]. Siendo la metabolómica originalmente propuesta como un método de genoma funcional [2], además generan una gran cantidad de datos de diferentes orígenes (microorganismos, plantas, animales e inclusive humanos). El valor del nivel del metabolito es de suma importancia ya que permite diferenciar los cambios efectuados en una muestra. Para su proceso, manipulación y análisis es un claro reto que requiere de matemática especializada, estadística o herramientas bioinformáticas. Estos datos se generan de diferentes técnicas analíticas comunes como: GC-MS, LC-MS, CE-MS, FTIR y finalmente la resonancia magnética nuclear (NMR) [3], siendo esta última, una alta técnica de análisis no destructiva reproducible que provee información acerca de todos los metabolitos.
Con el propósito de reducir el número de variables de los datos metabolómicos generados por el espectrómetro NMR, se optimiza el funcionamiento implicando un cierto riesgo de pérdida de información. Por este motivo las variables deben seleccionarse cuidadosamente; una selección inadecuada de variables puede llevar a un funcionamiento inaceptable del sistema. El uso de características y las formas en los datos para la clasificación es conocida como huella metabolómica [2] y los métodos más utilizados son análisis de componentes principales con análisis de discriminantes lineales (PCA-LDA) [4], mínimos cuadrados parciales y análisis de discriminantes lineales (PLS-LDA) [4], sin embargo, estos métodos pueden proveer buenos resultados de clasificación, pero suelen ser difíciles de interpretar.
Existen otros métodos de clasificación aplicados a datos metabolómicos como: redes neuronales artificiales [5], programación genética [6], algoritmos genéticos [7], los cuales usan un algoritmo genético para alinear picos en datos metabolómicos NMR, siendo estas técnicas de aprendizaje computacionales basadas en la teoría de la evolución de Darwin [8], y son populares para solucionar problemas de optimización. Finalmente fue incorporada la teoría de aprendizaje estadístico al introducir las máquinas de soporte vectorial (SVM) como una nueva clase de algoritmo de clasificación [9].

El objetivo de este estudio es determinar las diferencias relevantes en la composición metabólica de muestras de aceites de oliva y avellana, puros y mezclados con un total de 189 medidas NMR con adulteraciones del 2%, 5%, 10% y 30 Donde la selección de variables o como es el caso en particular, los metabolitos presentes en las muestras de aceites serán seleccionados aplicando una metodología de pre procesamiento. Una vez obtenida esta información se clasifican los diferentes espectros metabolómicos por medio de máquinas de soporte vectorial con mínimos cuadrados (LS-SVM), permitiendo comprobar las variables influyentes en la alteración del aceite de oliva o avellana.
A nivel general en la metabolómica, las variables seleccionadas o metabolitos característicos tiene una gran importancia y aplicación en diferentes campos: en comparación de mutantes [10,2], estudio para efectos globales de manipulación genética [11,2], toxicología [12,2], descubrimiento de nuevos medicamentos [13,2], nutrición [14,2], diabetes [15], cáncer [16], y descubrimiento de productos naturales [17].

 

Bibliografía

 

[1] JC Lindon, JK Nicholson, E Holmes. Handbook of Metabonomics and Metabolomics. Oxford. Elsevier, 2007

[2] Jennifer L. Spratlin,NatalieJ. Serkova, and S. Gail Eckhardt Clinical Applications of Metabolomics in Oncology: A Review .ClinCancerRes2009; 15:431-440.

[3] Oliver S.G, Winson MK, Kell D.B, et al. Systematic functional analysis of the yeast genome. Trends Biotechnology; 1998;16:373–8.

[4].Vladimir Shulaev, Metabolomics technology and bioinformatics. Briefings in Bioinformatics.2006; Vol 7.Nº 2. 128 -139

[5] Gregory D. Lewis, MD, AartiAsnani, BS, Robert E. Gerszten, MD, Application of Metabolomics to Cardiovascular Biomarker and Pathway Discovery, American College of Cardiology Foundation. Elsevier. Vol. 52, No. 2, 2008

[6] Nicholson JK, Lindon JC, Holmes E. “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 1999;29:1181–9.

[7] Howard Davies, A role for ‘‘omics” technologies in food safety assessment, Elsevier, Food Control 21 (2010) 1601–1610

[8] Fiehn O. Metabolomics–the link between genotypes and phenotypes. PlantMolBiol 2002;48:155–71.

[9] Halket JM, Waterman D, Przyborowska AM, et al.Chemicalderivatization and mass spectral libraries inmetabolic profiling by GC/MS and LC/MS/MS. J ExpBot 2005;56:219–43.

[10] Sofia Moco, Raoul J. Bino, Ric C.H. De Vos, Jacques VervoortMetabolomics technologies and metabolite identification Trends in Analytical Chemistry, Vol. 26, No. 9, 2007

[11] Horning EC, Horning MG. Metabolic profiles: gas-phase methods for analysis of metabolites. ClinChem 1971;17:802–9.

[12] Nicholson JK, Lindon JC, Holmes E. “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 1999;29: 1181–9.

[13] E.M. Purcell, H Torrey, RV Pound: Phys Rev. 1946, 69, 37

[14] F. Bloch, W Hansen, ME Packard: Phys Rev 1946, 69, 127

[15] RR Ernst, WA Anderson:Rev.Sci, Inst. 1966, 37,93

[16] Kurt Wuttrich, NMR Studies of structure and Function of Biological Macromolecules, Nobel Lecture, December 8, 2002

[17] David G. Sullivan, Preparing the Data, Boston University, 2012.

[18] Richard A. Davis , Adrian J. Charlton, John Godward, Stephen A. Jones, Mark Harrison, Julie C. Wilson, Richard A. Davis, Adrian J. Charlton, John Godward, Stephen A. Jones, Mark Harrison, Julie C. Wilson, Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform, Chemometrics and Intelligent Laboratory Systems 85 (2007) 144–154

[19] M.M.W. Hendriks, F.A. van Eeuwijk, R.H. Jellema, J.A. Westerhuis, T.H. Reijmers, H.C.J. Hoefsloot, A.K. Smilde, Data-processing strategies for metabolomics studies, Trends in Analytical Chemistry (2011), doi: 10.1016/j.trac.2011.04.019

[20] Kristian Hovde Liland, Multivariate methods in metabolomics from pre-processing to dimension reduction and statistical analysis, Trends in Analytical Chemistry, Vol. 30, No. 6, 2011.

[23] Jenny Forshed, Ina Schuppe-Koistinen and Sven P. Jacobsson, Peak alignment of NMR signals by means of a genetic algorithm, Journal of Pharmaceutical and Biomedical Analysis Volume 38, Issue 5, 10 August 2005, Pages 824-832 [24] Jenny Forshed, Ralf J.O. Torgrip, K. Magnus Åberg, Bo Karlberg, Johan Lindberg and Sven P. Jacobsson, A comparison of methods for alignment of NMR peaks in the context of cluster analysis. [25] Jenny Forshed, Ralf J.O. Torgrip, K. Magnus Åberg, Bo Karlberg, Johan Lindberg and Sven P. Jacobsson, A comparison of methods for alignment of NMR peaks in the context of cluster analysis

[26] Radka Stoyanova, Andrew W. Nicholls, Jeremy K. Nicholson, John C. Lindon and Truman R. BrownAutomatic alignment of individual peaks in large high-resolution spectral data sets,

[27] F. Savorani, G. Tomasi, S. Engelsen, J. Magn. Reson.202 (2010) 190. [28] G.J. Postma, P.W.T. Krooshof and L.M.C. Buydens, Opening the Kernel of Kernel Partial Least Squares and Support Vector Machines, Analytica Chimica Acta, 2011.

[29]Muñoz Acosta Carolina, Análisis multivariado para la identificación de componentes generadores de sabor y aroma en productos alimenticios, Universidad Nacional de Colombia Sede Manizales, 2010, pp. 25-26,

[30] Niels-Peter Vest Nielsen, Jens Michael Carstensen, JørnSmedsgaard ,*Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimized warping. Journal of Chromatography A, 805 (1998) 17–35.

[5] Dan Bylund, Rolf Danielsson, Gunnar Malmquist and Karin E. Markides Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography–mass spectrometry data, Journal of Chromatography Volume 961, Issue 2, 5 July 2002, Pages 237-244.

[29. 1] Pavel Senin, Dynamic Time Warping Algorithm Review,2008.

[31] G.A. Pearson, A general baseline-recognition and baseline-flattening algorithm, J. Magn.Reson. 27 (1977) 265–272.

[32] Z. Zolnai, S. Macura, J.L. Markley, Spline method for correcting baseplane distortions in two-dimensional NMR spectra, J. Magn.Reson. 82 (1989) 496–504.

[33] A. Heuer, U. Haeberlen, A new method for suppressing baseline distortions in FT NMR, J. Magn. Reson. 85 (1989) 79–94.

[34] P. Guntert, K. Wuthrich, FLATT—a new procedure for high-quality baseline correction of multidimensional NMR spectra, J. Magn.Reson. 96 (1992) 403–407.

[35] C. Bartels, P. Guntert, K. Wuthrich, IFLAT a new automatic baseline correction method for multidimensional NMR spectra with strong solvent signals, J. Magn. Reson. 117 (1995) 330–333.

[36] David Chang, Cory D. Banack, Sirish L. Shah Robust baseline correction algorithm for signal dense NMR spectra, Journal of Magnetic Resonance 187 (2007) 288–292.

[37] Bartels C, Güntert P, Wüthrich K: A new automatic baseline correction method for multidimensional NMR spectra with strong solvent signals. J MagnResonSerA1995, 117:330-333.

[38] Lunga G, Pogni R, Basosi R: A Simple Method for Baseline Correction in EPR spectroscopy. J MagnResonSerA1994, 108:65-70.

[39] Silvia De Sanctis, Wilhelm M. Malloni, Werner Kremer, Ana M. Tomé, Elmar W. Lang, Klaus-Peter.Neidig, Hans Robert Kalbitzer, Singular spectrum analysis for an automated solvent artifact removal and baseline correction of 1D NMR spectra,Journal of Magnetic Resonance 210 (2011) 177–183.

[40] Yuanxin Xi, David M Rocke, Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis, BMC Bioinformatics 2008, 9:324,

[41] Donald A. Barkauskas, David M. Rocke, A general-purpose baseline estimation algorithm for spectroscopic data, Analytica Chimica Acta 657 (2010) 191–197.

[42] Golotvin S, Williams A: Improved Baseline Recognition and Modeling of FT NMR Spectra. J MagnReson2000, 146:122-125.

[43] Barnes,R. J., M.S.Danhoa et al. Correction to the description of Standard normal Variate (SNV) and De-Trend (DT) Transformations in practical Spectroscopy whit application in food and beverage analysis – 2ndedition.Journal of Near Infrared Spectroscopy (1993). 1:185-186.

[44] Moya Gonzalez Adolfo, Desarrollo de un sistema automatizado para la clasificación de bulbos de cebolla basado en espectrometría NIR, Tesis Doctoral, Universidad Politécnica de Madrid, 2010, 39-48.

[45] C. Lieber, A. Mahadevan-Jansen, Appl. Spectrosc. 57 (2003), 1363.

[46] P. Eilers, Anal. Chem. 76 (2004) 404.

[47] Luai Al Shalabi, Zyad Shaaban, Basel Kasasbeh, Data Mining: A Preprocessing Engine, Journal of Computer Science 2 (9): 735-739, 2006

[48] Savitzky, A., Golay, M.J.E. (1964), Smoothing and differentiation of data by simplified least squares procedure. Anal. Chem., 36: 1627-1639.

[49] Bouveresse, E., Maintenance and Transfer of Multivariate Calibration Models Based on Near-Infrared Spectroscopy, doctoral thesis, Vrije Universiteit Brussel, 1997.

[50] Sandra Castillo, Peddint iGopalacharyulu, Laxman Yetukuri, Matej Orešič Algorithms and tools for the preprocessing of LC–MS metabolomics data, Chemometrics and Intelligent Laboratory Systems 108 (2011) 23–32

[51] C.A. Smith, E.J. Want, G. O'Maille, R. Abagyan, G. Siuzdak, XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification, Anal. Chem. 78 (2006) 779–787.

[52] T. Yu, Y. Park, J.M. Johnson, D.P. Jones, apLCMS—adaptive processing of highresolution

LC/MS data, Bioinformatics 25 (2009) 1930–1936.

[53] [28] M. Sturm, A. Bertsch, C. Gröpl, A. Hildebrandt, R. Hussong, E. Lange, N. Pfeifer, O. Schulz-Trieglaff, A. Zerck, K. Reinert, O. Kohlbacher, OpenMS—an open-source software framework for mass spectrometry, BMC Bioinform. 9 (2008) 163.

[54] [22] A. Lommen, Metalign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing, Anal. Chem. 81 (2009) 3079–3086.

[55.] Sarah J. Dixon, Nina Heinrich, Maria Holmboe, Michele L. Schaefer, Randall R. Reed, Jose Trevejo, Richard G. Brereton, Application of classification methods when group sizes are unequal by incorporation of prior probabilities to three common approaches: Application to simulations and mouse urinary chemosignals, Chemometrics and Intelligent Laboratory Systems 99 (2009) 111–120.

 

[56] Jeroen J. Jansen, Huub C. J. Hoefsloot, Hans F. M. Boelens, Jan van der Greef ,Age K. Smilde, Analysis of longitudinal metabolomics data, bioinformatics, Vol. 20 no. 15 2004, pages 2438–2446

[57] HaiweiGu, Zhengzheng Pan, Bowei Xi, Vincent Asiago, Brian Musselman, Daniel Raftery, Principal component directed partial least squares analysis for combining nuclear magnetic resonance and mass spectrometry data in metabolomics: Application to the detection of breast cancer

[58] HaiweiGu, Zhengzheng Pan, Chester Duda, Doug Mann, Candice Kissinger, Candace Rohde, Daniel Raftery,1H NMR study of the effects of sample contamination in the metabolomic analysis of mouse urine, Journal of Pharmaceutical and Biomedical Analysis 45 (2007) 134–140.

[4] McCue, K.F., Allen, P.V., Shepherd, L.V.T, Blake, A., Whitworth, J., Maccree, M.M., Rockhold, D.R., Stewart, D., Davies, H.V., Belknap, W.R. (2006) The Primary In Vivo Steroidal Alkaloid Glucosyltransferase From Potato. Phytochemistry 67(15):1590-7.

58] Beckonert, O., Bollard, M. E., Ebbels, T. M. D., Keun, H. C., Antti, H., Holmes, E., et al. (2003). NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches. Anal ChimActa 490(1– 2), 3– 15.

[59] Hector C. Keun, Metabonomic modeling of drug toxicity Pharmacology & Therapeutics 109 (2006) 92 – 106

[60] Jin-mei Xia, Xiao-jian Wu and Ying-jin Yuan, Integration of wavelet transform with PCA and ANN for metabolomics data-mining, Metabolomics Volume 3, Number 4, 531-537.

[61] D. F. Brougham, G. Ivanova, M. Gottschalk, D.M. Collins, A. J. Eustace, R. O’Connor, J. Havel, Artificial Neural Networks for Classification in Metabolomic Studies ofWhole Cells Using 1H NuclearMagnetic Resonance, Journal of Biomedicine and Biotechnology, Volume 2011, Article ID 158094, 8 pages.

[62] Gualdrón Guerrero Oscar Eduardo, Desarrollo de diferentes métodos de selección de variables para sistemas multisensoriales, Tesis Doctoral, 2006,

[63] Karl-Heinz Ott, Nelly Aranı́bar, Bijay Singh, Gerald W. Stockton, Metabonomics classifies pathways affected by bioactive compounds. Artificial neural network classification of NMR spectra of plant extracts Original Research Article, Phytochemistry, Volume 62, Issue 6, March 2003, Pages 971-985.

[64] Vapnik, V. (1998b). The support vector method of function estimation. In J. A. K. Suykens, & J. Vandewalle, (Eds.), Nonlinear Modeling: Advanced Black-box Techniques. Boston: Kluwer Academic Publishers.

[65]J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett. 9 (3) (1999b) 293–300.

[66] Jan Luts,∗, Fabian Ojeda, Raf Van de Plas, Bart De Moor, Sabine Van Huffel, Johan A.K.A tutorial on support vector machine-based methods for classification problems in chemometrics ,Suykens Analytica Chimica, Acta 665 (2010) 129–145

[67]C. Lu et al., J., Rapid detection of melamine in milk powder by near infrared spectroscopy, Near Infrared Spectrosc. 17, 59–67 (2009)

[68] Díaz Valdés Gonzalo, Pronosticó del precio del oro mediante least square support vector machine (ls-svm), 2007, 121-123.