stats202 - data mining & analysis


Instructor:
Prof. J. Taylor
Sequoia Hall #137
Email
723-9230

Schedule:
MWF 1:15-2:05
Location:
Gates B3
Office Hours:
MW 3:00-4:00 or by appointment
Textbook:
Introduction to Data Mining. Tan, Steinbach & Kumar
Optional reference:
Elements of statistical Learning. Hastie, Tibshirani & Friedman
. (A more statistically advanced treatment of most of the topics.)
TAs & Office Hours:
Videos:
Computing
environment:

Most examples in class will use R a freely available and powerful application for data analysis.
General
Outline:

Data mining is used to discover patterns and relationships in data. Emphasis is on large complex data sets such as those in very large databases or through web mining. Topics:
  • decision trees
  • neural networks
  • association rules
  • clustering
  • case based methods
  • data visualization.
Prerequisites: None.
Evaluation:
  • 6 assignments: 60%
  • midterm exam: 15%
  • final exam: 25%
Homework submission:
Homework is to be submitted electronically, by emailing stats202-aut0910-staff@lists.stanford.edu. Please include a subject like "HW1 submission" in your email. SCPD students should also CC their homework to scpd-distribution@lists.stanford.edu
Assignments:
Guidelines:
  • Provide copies of your code in the assignment.
  • Assignments should in {\bf one} PDF file.
  • Please include your name in the PDF file.
Notes:
We will be using the course notes available from the textbook. I will use a slightly edited version of these slides, along with examples that I will post as we go through the material.