Topological Methods in Scientific Computing, Statistics and Computer Science

| Current Events | Past Events | Computer Programs |

- In Preprints:
- G. Carlsson and A. Zomorodian, "The theory of
multidimensional persistence,"
*23rd ACM Symposium on Computational Geometry*, Gyeongju, South Korea, June 6–7, 2007.

- G. Carlsson and A. Zomorodian, "The theory of
multidimensional persistence,"
- In Links:
- Robert Ghrist, "Barcodes: the persistent topology of data,"
AMS Current Events Bulletin,
*2007 Joint Mathematics Meetings*, New Orleans, LA, January 5–8, 2007 - Alan Wang, "Top mathematicians meet in Berkeley
working to solve science, social issues,"
*ABC News*, December 5, 2006

- Robert Ghrist, "Barcodes: the persistent topology of data,"
AMS Current Events Bulletin,
- In Computer Programs:
- PLEX Public Release Version 2.5, September 7, 2006

Persi Diaconis, Mary V. Sunseri Professor of Mathematics and Statistics, Stanford

Leo Guibas, Professor of Computer Science, Stanford

Susan Holmes, Professor of Statistics, Stanford

Josh Tenenbaum, Paul E. Newton Career Development Professor of Brain and Cognitive Science, MIT

Harlan Sexton (Stanford, Mathematics)

Jonathan Kaplan (Stanford, Mathematics)

Facundo Memoli (Stanford, Mathematics)

Steve Oudot (Stanford, Computer Science)

Yuan Yao (Berkeley, Mathematics)

Jennifer Novak Kloke (Stanford Mathematics)

Queenie Lee (ICME)

Michael Lesnick (ICME)

Lek-Heng Lim (Stanford, Computational and Mathematical Engineering)

Arian Maleki (Stanford Electrical Engineering)

Gurjeet Singh (ICME)

Erik Carlsson (Princeton, Mathematics)

Anne Collins (Centre College, Mathematics)

Vin de Silva (Pomona, Mathematics)

Tom Griffiths (Brown, Cognitive and Linguistic Sciences)

Yichi Gu (Deseam Research Lab, Beijng)

Michael He (Chicago, Statistics)

Peter Lee (MIT, Mathematics)

Debashis Paul (Stanford, Statistics)

Pat Perry (Stanford, Statistics)

Cathy Wu

Afra Zomorodian (Dartmouth, Computer Science)

Margit Zwemer (Stanford, Mathematics)

The overall goal of this project is to develop flexible topological methods which will allow the analysis of data which is difficult to analyze using classical linear methods. Data obtained by sampling from highly curved manifolds or singular algebraic varieties in Euclidean space are typical examples where our methods will be useful. We intend to develop and refine two pieces of software which have been written by members of our research group, ISOMAP (Tenenbaum) and PLEX (de Silva-Carlsson). ISOMAP is a tool for dimension reduction and parameterization of high dimensional data sets, and PLEX is a homology computing tool which we will use in locating and analyzing singular points in data sets, as well as estimating dimension in situations where standard methods do not work well. We plan to extend the range of applicability of both tools, in the case of ISOMAP by studying embeddings into spaces with non-Euclidean metrics, and in the case of PLEX by building in the Mayer-Vietoris spectral sequence as a tool. Both ISOMAP and PLEX will be adapted for parallel computing. We will also begin the theoretical study of statistical questions relating to topology. For instance, we will initiate the study of higher dimensional homology of subsets sampled from Euclidean space under various sampling hypotheses. The key object of study will be the family of Cech complexes constructed using the distance function in Euclidean space together with a randomly chosen finite set of points in Euclidean space.

The goal of this project is to develop tools for understanding data sets which are not easy to understand using standard methods. This kind of data might include singular points, or might be strongly curved. The data is also high dimensional, in the sense that each data point has many coordinates. For instance, we might have a data set whose points each of which is an image, which has one coordinate for each pixel. Many standard tools rely on linear approximations, which do not work well in strongly curved or singular problems. The kind of tools we have in mind are in part topological, in the sense that they measure more qualitative properties of the spaces involved, such as connectedness, or the number of holes in a space, and so on. This group of methods has the capability of recognizing the number of parameters required to describe a space, without actually parameterizing it. These methods also have the capability of recognizing singular points (like points where two non-parallel planes or non-parallel lines intersect), without actually having to construct coordinates on the space. We will also be further developing and refining methods we have already constructed which can actually find good parameterizations for many high dimensional data sets. Both projects will involve the adaptation for the computer of many methods which have heretofore been used in by-hand calculations for solving theoretical problems. We will also initiate the theoretical development of topological tools in a setting which includes errors and sampling.

The wide availability of scanning technologies for 3-d shape acquisition has led to the development of algorithms and software, in both academia and industry, for building surface meshes from the raw data returned by the scanner. For most optical and contact sensors this raw data is in the form of an unorganized point cloud, or PCD set (point cloud data); current scanners are able to generate object surface samplings at submillimeter accuracy. At such high resolutions, these dense PCD sets become adequate representations of object geometry in themselves, without any need to fit a mesh or surface. The work undertaken under this grant will exploit tools from algebraic topology and computational geometry to perform local and global analysis directly and efficiently on PCD data. Specifically, algorithms for PCD feature detection, segmentation, matching, fitting, and compression will be developed and analyzed. It is expected that PCD sets will become a lightweight and convenient representation of object geometry not only in reconstruction or visualization applications, but even more so in the largely unexplored area of directly interrogating object geometry in the PCD format for applications such as robotic exploration or product customization. Over the next five years we can expect to see wide commercial deployment of 3-d scanning and the appearance of compact, portable scanning devices. By exploiting the acquired PCD data it will become possible to customize and tailor manufactured products to their user to a degree that has been impossible up to now. Technology developed under this proposal should facilitate this transition in the apparel, footwear, hearing-aid, orthodontic, prosthetics, and automotive industries.

This project will develop topological tools for understanding qualitative properties of data sets. We will use homology as applied to data sets directly and to derived complexes to define invariants or signatures that distinguish between the underlying geometric objects. Important goals will include the identification, location, and classification of qualitative features of the data set, such as the presence of corners, edges, cone points, etc. and the use of homology applied to canonically defined blowups and tangent complexes to distinguish between two dimensional shapes in three dimensional Euclidean space. We will use the recently developed techniques of persistence and landmarking to make homology a stable and readily computable invariant. We will also develop the theory of multidimensional persistence, in which one studies spaces that are equipped with several parameters, in order to better understand data sets in which there are several different parameters describing different geometric properties of the space. The overall goal is to continue to develop and improve the available tools for studying qualitative information about geometric objects. The goal of this project is to develop tools for understanding data sets that are not easy to understand using standard methods of statistics and analysis. This kind of data might include singular points, or might be strongly curved. The data is also high dimensional, in the sense that each data point has many coordinates. For instance, we might have a data set whose points each of which is an image, which has one coordinate for each pixel. Many standard tools rely on linear approximations, which do not work well in strongly curved or singular problems. The kind of tools we have in mind are in part topological, in the sense that they measure more qualitative properties of the spaces involved, such as connectedness, or the number of holes in a space, and so on. For example, the project takes the point of view that it is better to understand qualitative properties before attempting to do more precise quantitative analysis and better to distinguish shapes by understanding them qualitatively rather than doing data base comparisons. Thus, methods will be developed to compute, in a timely, robust, and trustworthy manner, the fundamental geometric properties that any realistic mathematical model associated to a given data set must contain. Then statistical and analytic techniques may be applied to the geometrically correct models in order to extract the detailed information desired by practitioners.

E-mail webmaster: l e k h e n g @ m a t h . s t a n f o r d . e d u