Topological Data Analysis 15:10 Fri 31 Aug, 2018 :: Napier 208 :: Dr Vanessa Robins :: Australian National University
Topological Data Analysis has grown out of work focussed on deriving qualitative and yet quantifiable information about the shape of data. The underlying assumption is that knowledge of shape - the way the data are distributed - permits high-level reasoning and modelling of the processes that created this data. The 0-th order aspect of shape is the number pieces: "connected components" to a topologist; "clustering" to a statistician. Higher-order topological aspects of shape are holes, quantified as "non-bounding cycles" in homology theory. These signal the existence of some type of constraint on the data-generating process.
Homology lends itself naturally to computer implementation, but its naive application is not robust to noise. This inspired the development of persistent homology: an algebraic topological tool that measures changes in the topology of a growing sequence of spaces (a filtration). Persistent homology provides invariants called the barcodes or persistence diagrams that are sets of intervals recording the birth and death parameter values of each homology class in the filtration. It captures information about the shape of data over a range of length scales, and enables the identification of "noisy" topological structure.
Statistical analysis of persistent homology has been challenging because the raw information (the persistence diagrams) are provided as sets of intervals rather than functions. Various approaches to converting persistence diagrams to functional forms have been developed recently, and have found application to data ranging from the distribution of galaxies, to porous materials, and cancer detection.