|
Description: The main goal
of this course is to expose students to modern ideas in
statistical theory. Whereas classical theory is
concerned with the behavior of statistical estimates
when the number of variables is fixed and the sample
size increases, our emphasis is on statistical inference
in high-dimensional settings where there may be as many,
or more, variables than observations. Our focus is
motivated by always newer technologies, which now
produce extremely large datasets, often with huge
numbers of measurements on each of a comparatively small
number of experimental units.
Prerequisite: Stats 300A and
300B. Knoweldge of probability theory at the level of
Stats 310A and 310B.
Syllabus:
- Testing problems in high dimensions: sparse
alternatives (needle in a haystack) and nonsparse
alternatives, Bonferroni's method, Fisher's test,
ANOVA, higher criticism.
- Multiple testing problems: familywise error
rate (FWER), procedures for controlling FWER, false
discovery rate (FDR), procedures for controlling FDR,
empirical Bayes view of FDR, local FDR.
- James-Stein estimation, Stein's unbiased risk
estimate.
- Model selection in high dimensions:
thresholding rules, Cp/Akaike Information Criterion,
Bayesian Information Criterion, Risk Inflation
Criterion.
- Oracles and oracle inequalities.
-
Computationally tractable methods for variable selection: the LASSO, the Dantzig selector.
-
Topics in graphical inference and in machine learning.
Textbooks:
We will
not follow a textbook but the students might find the following
references useful for background reading.
- Large-Scale Inference: Empirical Bayes Methods for
Estimation, Testing, and Prediction by B. Efron, IMS Monographs.
- Gaussian estimation: Sequence and wavelet models by I. Johnstone and available here.
The books below provide background for a few probabilistic results
that we shall use.
- Large deviations techniques and applications, second edition by
A. Dembo and O. Zeitouni, Springer, Application of Mathematics,
vol. 38.
- Empirical Processes With Applications to Statistics by
G Shorack and J Wellner, Classics in Applied Mathematics.
- Random Fields and Geometry by R. Adler and J. Taylor, Springer
Monographs in Mathematics Springer, New York.
Helen has all the published books.
Handouts: I will post some lecture notes
online, see the proper
section.
Course assistant and office hours:
- Bhaswar Bhattacharya () Office hours M 1-3, 206 Sequoia Hall.
- Chaojun Wang () Office hours T 2-3 and F 1-2, 240 Sequoia Hall.
Grading (tentative):
- Homework assignments: 50%
- Homework assignments will generally be distributed on
Wednesdays and are due in class the following
Wednesday.
- Late homeworks will NOT be accepted for grading
(medical emergencies excepted with proof).
- There will be about 6 assignments; the lowest score
will be dropped in the final grade.
- It is encouraged to discuss the problem sets with others, but
everyone needs to turn in a unique personal write-up.
- Final project: 50%.
- TBD. Most likely a standard final exam. Less likely, a class
presentation on a subject determined by the instructor.
Course policies:
- Use of sources (people, books, internet and so on)
without citing them in homework sets results in failing
grade for course.
|