Harvard Conference Center,
Boston, MA: Oct 6-7 (Mon-Tues), 2008.

In this course we emphasize the tools useful for tackling modern-day data analysis problems. These include gradient boosting, SVMs and kernel methods, random forests, lasso and LARS, ridge regression and GAMs, supervised principal components, and cross-validation. We also present some interesting case studies in a variety of application areas.
This course focuses on both tall data ( N>p, where N is the number of cases, and p the number of features) and wide data (p>N). Typical examples of tall data are credit risk and churn prediction, and email spam filtering. Topics include linear and ridge regression, lasso, and LARS, support vector machines, random forests and boosting. We give in-depth discussion of validation, cross-validation and test set issues.
For wide data, typical examples are gene expression and protein mass
spectrometry data, and data from signals and images. Topics include
clustering and data visualization, false discovery rates and SAM,
regularized logistic regression and discriminant analysis, supervised
and unsupervised principal components, support vector machines and the
kernel trick, and the careful use of model selection strategies.
Statistical Learning and Data Mining (2001-2005)
The two earlier courses are not a prerequisite for this new course.
Software for these techniques will be illustrated, and a copy of the text "Elements of Statistical Learning: data mining, inference and prediction" (Hastie, Tibshirani and Friedman) and a comprehensive set of class notes will be provided. In addition, drafts of new chapters for a second edition of the book will be given to each attendee.
The instructors Professor Trevor Hastie of the Statistics and Biostatistics
Departments at Stanford University was formerly a member of the
Statistics and Data Analysis Research group, AT&T Bell
Laboratories. He co-authored with Tibshirani the monograph
Generalized Additive Models (1990) published by Chapman and
Hall, and has many research articles in the area of nonparametric
regression and classification. He also co-edited the Wadsworth book
Statistical Models in S (1991) with John Chambers. His
Ph.D. thesis Principal Curves introduced one of the first
nonlinear versions of principal components analysis. During his ten
years at Bell Laboratories he gained valuable experience with
classification and regression problems in industry and
manufacturing.
Professor Robert Tibshirani of the Biostatistics and Statistics departments at Stanford University is a recipient of the COPSS award - an award given jointly by all the leading statistical societies to the most outstanding statistician under the age of 40. He also has many research articles on nonparametric regression and classification. With Bradley Efron he co-authored the best-selling text An Introduction to the Bootstrap in 1993, and has been an active researcher on bootstrap technology for the past 11 years. His 1984 Ph.D thesis spawned the currently lively research area known as Local Likelihood. He has more than twenty years experience in consulting on biostatistical problems.
Professors
Hastie and Tibshirani
published
"The Elements
of Statistical learning: Data mining,
inference and prediction", with Jerome
Friedman (springer, 2001). This book has
received a terrific reception, with over 20,000 copies sold to date.They are
actively involved in research in
regression, classification and
clustering, and are well-known not only in the statistics community
but in the machine-learning, neural
network and bioinformatics fields as
well.
In recent years they have become leaders in the statistical analysis of
DNA microarrays, working with leading-edge
biologists
such as Patrick Brown of Stanford University, and David Botstein of Princeton. They have given
many short courses together on
classification and regression procedures
to a wide variety of academic,
government and industrial
audiences. These include the American
Statistical Association and Interface
meetings, NATO ASI Neural Networks and
Statistics workshop, AI and Statistics,
and the Canadian, South African, and New
Zealand Statistical Society
meetings. They have a reputation for
being good instructors who interact well
with the needs of the audience.
PRICE: $1250 per attendee. Full time student price: $975.
Discounts for groups of 4 or more.
Attendance is limited to
the first 75 applicants, so sign up soon! These courses fill up
quickly.
REGISTRATION FORM for Harvard course
Read here for more details on
who should
attend, and our
policy
not to sell our course notes.
http://www-stat.stanford.edu/~hastie/mrc.html
Last modified: Thu Dec 6 15:28:53 2007