-----

STATISTICAL LEARNING AND DATA MINING II:
Tools for Tall and Wide Data

State-of-the-Art Statistical Methods for Data Analysis

Harvard Conference Center, Boston, MA: Oct 31 - Nov 1, 2005.

-----

Prediction Surface

A short course given by
Trevor Hastie and Robert Tibshirani
both of Stanford University
This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips.

In this course we emphasize the tools useful for tackling modern-day data analysis problems. These include gradient boosting, SVMs and kernel methods, random forests, lasso and LARS, ridge regression and GAMs, supervised principal components, and cross-validation. We also present some interesting case studies in a variety of application areas.

This course focuses on both tall data ( N>p, where N is the number of cases, and p the number of features) and wide data (p>N). Typical examples of tall data are credit risk and churn prediction, and email spam filtering. Topics include linear and ridge regression, lasso, and LARS, support vector machines, random forests and boosting. We give in-depth discussion of validation, cross-validation and test set issues.

For wide data, typical examples are gene expression and protein mass spectrometry data, and data from signals and images. Topics include clustering and data visualization, false discovery rates and SAM, regularized logistic regression and discriminant analysis, supervised and unsupervised principal components, support vector machines and the kernel trick, and the careful use of model selection strategies.

This course is the third in a series, and follows our popular past offerings:

Modern Regression and Classification (1996-2000)

Statistical Learning and Data Mining (2001-2005)

The two earlier courses are not a prerequisite for this new course.

Software for these techniques will be illustrated, and a copy of the text "Elements of Statistical Learning: data mining, inference and prediction" (Hastie, Tibshirani and Friedman) and a comprehensive set of class notes will be provided. In addition, drafts of new chapters for a second edition of the book will be given to each attendee.

The instructors

Professor Trevor Hastie of the Statistics and Biostatistics Departments at Stanford University was formerly a member of the Statistics and Data Analysis Research group, AT&T Bell Laboratories. He co-authored with Tibshirani the monograph Generalized Additive Models (1990) published by Chapman and Hall, and has many research articles in the area of nonparametric regression and classification. He also co-edited the Wadsworth book Statistical Models in S (1991) with John Chambers. His Ph.D. thesis Principal Curves introduced one of the first nonlinear versions of principal components analysis. During his ten years at Bell Laboratories he gained valuable experience with classification and regression problems in industry and manufacturing.

Professor Robert Tibshirani of the Biostatistics and Statistics departments at Stanford University is a recipient of the COPSS award - an award given jointly by all the leading statistical societies to the most outstanding statistician under the age of 40. He also has many research articles on nonparametric regression and classification. With Bradley Efron he co-authored the best-selling text An Introduction to the Bootstrap in 1993, and has been an active researcher on bootstrap technology for the past 11 years. His 1984 Ph.D thesis spawned the currently lively research area known as Local Likelihood. He has more than twenty years experience in consulting on biostatistical problems.

World Professors Hastie and Tibshirani published "The Elements of Statistical learning: Data mining, inference and prediction", with Jerome Friedman (springer, 2001). This book has received a terrific reception, with over 20,000 copies sold to date.They are actively involved in research in regression, classification and clustering, and are well-known not only in the statistics community but in the machine-learning, neural network and bioinformatics fields as well. In recent years they have become leaders in the statistical analysis of DNA microarrays, working with leading-edge biologists such as Patrick Brown of Stanford University, and David Botstein of Princeton. They have given many short courses together on classification and regression procedures to a wide variety of academic, government and industrial audiences. These include the American Statistical Association and Interface meetings, NATO ASI Neural Networks and Statistics workshop, AI and Statistics, and the Canadian, South African, and New Zealand Statistical Society meetings. They have a reputation for being good instructors who interact well with the needs of the audience.

The previous course "Statistical Learning and Data Mining" by Hastie and Tibshirani took place at

These courses were filled to capacity, and were enthusiastically received by attendees from biotech, financial and other industrial areas.

Their first course - "Modern Regression and Classification" - took place at:


Some quotes from past attendees:

info

COURSE DETAILS:


HOTELS NEARBY:

SCHEDULE: Days 1 and 2


PRICE: $1250 per attendee. Full time student price: $975. Attendance is limited to the first 75 applicants, so sign up soon! These courses fill up quickly.


REGISTRATION FORM for Harvard course

Read here for more details on who should attend, and our policy not to sell our course notes.

[Home] SLDM Courses Homepage

http://www-stat.stanford.edu/~hastie/mrc.html