Harvard Conference Center,
Boston, MA: Oct 6-7 (Mon-Tues), 2008.

Statistical Learning and Data Mining (2001-2005)
In this course we emphasize the tools useful for tackling modern-day data analysis problems. These include gradient boosting, SVMs and kernel methods, random forests, lasso and LARS, ridge regression and GAMs, supervised principal components, and cross-validation. We also present some interesting case studies in a variety of application areas.
This course focuses on both tall data ( N>p, where N is the number of cases, and p the number of features) and wide data (p>N). Typical examples of tall data are credit risk and churn prediction, and email spam filtering. Topics include linear and ridge regression, lasso, and LARS, support vector machines, random forests and boosting. We give in-depth discussion of validation, cross-validation and test set issues.
For wide data, typical examples are gene expression and protein mass spectrometry data, and data from signals and images. Topics include clustering and data visualization, false discovery rates and SAM, regularized logistic regression and discriminant analysis, supervised and unsupervised principal components, support vector machines and the kernel trick, and the careful use of model selection strategies.
The two earlier courses are not a prerequisite for this new course.
The material is based on recent papers by the authors and other researchers, as well as the best selling book:
The lectures will consist of video-projected presentations and discussion.
http://www-stat.stanford.edu/~hastie/sldm.html