\documentclass[11pt]{article} \setlength{\oddsidemargin}{0.0truein} \setlength{\evensidemargin}{0.0truein} \setlength{\textwidth}{6.5truein} \setlength{\topmargin}{0.0truein} \setlength{\textheight}{9.0truein} \setlength{\headsep}{0.0truein} \setlength{\headheight}{0.0truein} \setlength{\topskip}{10.0pt} \setlength{\parskip}{5mm} \usepackage{url} \usepackage{amsmath} \usepackage{amssymb} \pagestyle{empty} \begin{document} \begin{center} \textbf{\Large{\textsc{STANFORD UNIVERSITY}}}\\[5pt] \textbf{\Large{\textsc{DEPARTMENT OF STATISTICS}}}\\[5pt] \Large{\textsc{DEPARTMENTAL SEMINAR}} \end{center} % In the following statements, replace "Time of talk", % "Weekday", and "Date of talk". An example is provided. % If you are not sure about this, just skip this part. \begin{center} 4:15 p.m., Tuesday, April 8, 2008\\ %% Example: 4:15 p.m., Tuesday, February 13, 2007\\ Sequoia Hall Room 200\\ (Cookies at 3:45 in 1st Floor Lounge) \end{center} % In the following statements, replace "Name of the speaker" with your % name, "Department Affiliation" with your department affiliation, and %"University Affiliation" with your university affiliation. \begin{center} \textsl{Art B. Owen} \\ Department of Statistics\\ Stanford University \end{center} % In the following statements, replace "Title of the talk" % with your title of the talk. \begin{center} \subsection*{Transposably invariant sample reuse methods} \end{center} % In the following statements, replace "Abstract of the talk" % with your abstract. Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They have some face value validity and they don't depend on strong model assumptions. These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables. This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices. Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case. \par\noindent This work is joint with Patrick Perry. \end{document}