\documentclass[11pt]{article} \setlength{\oddsidemargin}{0.0truein} \setlength{\evensidemargin}{0.0truein} \setlength{\textwidth}{6.5truein} \setlength{\topmargin}{0.0truein} \setlength{\textheight}{9.0truein} \setlength{\headsep}{0.0truein} \setlength{\headheight}{0.0truein} \setlength{\topskip}{10.0pt} \setlength{\parskip}{5mm} \usepackage{url} \usepackage{amsmath} \usepackage{amssymb} \pagestyle{empty} \begin{document} \begin{center} \textbf{\Large{\textsc{STANFORD UNIVERSITY}}}\\[5pt] \textbf{\Large{\textsc{DEPARTMENT OF STATISTICS}}}\\[5pt] \Large{\textsc{DEPARTMENTAL SEMINAR}} \end{center} % In the following statements, replace "Time of talk", % "Weekday", and "Date of talk". An example is provided. % If you are not sure about this, just skip this part. \begin{center} 4:15 p.m., Tuesday, November 27, 2007\\ Sequoia Hall Room 200\\ (Cookies at 3:45 in 1st Floor Lounge) \end{center} % In the following statements, replace "Name of the speaker" with your % name, "Department Affiliation" with your department affiliation, and %"University Affiliation" with your university affiliation. \begin{center} \textsl{Regina Liu} \\ Department of Statistics\\ Rutgers University \end{center} % In the following statements, replace "Title of the talk" % with your title of the talk. \begin{center} \subsection*{Mining Massive Text Data: Classification, Construction of Tracking Statistics and Inference under Misclassification} \end{center} % In the following statements, replace "Abstract of the talk" % with your abstract. \noindent We present a systematic data mining procedure for exploring large free-style text datasets to discover useful features and develop tracking statistics (often referred to as performance measures or risk indicators). The procedure includes text classification, construction of tracking statistics, inference under error measurements and risk management. The main difficulty in deriving this inference scheme is the accounting for misclassification errors, for which we propose two types of approaches: "plug-in" and "projection" methods. We also consider the bootstrap calibration for fine tuning. Finally, as an illustrative example, the proposed data mining procedure is applied to analyzing an FAA aviation safety report repository to show its utility in aviation risk management or general decision-support systems. Although most illustrations here are drawn from aviation safety data, the proposed data mining procedure applies to many other domains, including, for example, mining free-style medical reports for tracking medical errors or possible disease outbreaks. This is joint work with Daniel Jeske, Department of Statistics, UC Riverside. \end{document}