Analysis of multivariable and high dimensional data
Listed as Statistics Topic D in the Course Planner.
Go to this course in the University Course Planner.
Multivariate analysis of data is performed with the aims to
1. understand the structure in data and summarise the data in simpler ways;
2. understand the relationship of one part of the data to another part; and
3. make decisions or draw inferences based on data.
The statistical analyses of multivariate data extend those of univariate data, and in doing so require
more advanced mathematical theory and computational techniques. The course begins with a
discussion of the three classical methods Principal Component Analysis, Canonical Correlation
Analysis and Discriminant Analysis which correspond to the aims above. We also learn about
Cluster Analysis, Factor Analysis and newer methods including Independent Component Analysis.
For most real data the underlying distribution is not known, but if the assumptions of multivariate
normality of the data hold, extra properties can be derived. Our treatment combines ideas,
theoretical properties and a strong computational component for each of the different methods we
discuss. For the computational part -- with Matlab -- we make use of real data and learn the use
of simulations in order to assess the performance of different methods in practice.
1. Introduction to multivariate data, the multivariate normal distribution
2. Principal Component Analysis, theory and practice
3. Canonical Correlation Analysis, theory and practice
4. Discriminant Analysis, Fisher's LDA, linear and quadratic DA
5. Cluster Analysis: hierarchical and k-means methods
6. Factor Analysis and latent variables
7. Independent Component Analysis including an Introduction to Information Theory
The course will be based on my forthcoming monograph
Analysis of Multivariate and High-Dimensional Data - Theory and Practice, to be published by
Cambridge University Press.
This course is not recorded as prequisite for other courses.