Abstract (click to view)
Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free energy landscapes. However, the inherently high dimensionality of a protein free energy landscape poses a challenge on the rationalization of how proteins perform their functions. For this reason. dimensionality reduction (DR) is an active field of research for molecular biologists. The Uniform Manifold Approximation and Projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared to other popular dimensionality reduction methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE), Principal Component Analysis (Analysis (PCA), and time-structure Independent Components Analysis (t-ICA) in context of analyzing Molecular Dynamics simulations of the circadian clock protein Vivid. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different DR methods, showed the superiority of UMAP compared to linear reduction methods (PCA, t-ICA) and comparable performance with t-SNE, thus far the state-of-the-art method.
Francesco Trozzi
Program: PhD in Theoretical and Computational Chemistry
Faculty mentor: Peng Tao