Statistics 345:  Statistical Methods in Computational Biology

Spring 2008   ●   T Th   11:00-12:15 PM     Redwood G19

Nancy R. Zhang    nzhang atstanford    Office Hours:  Tues  2:15-4:00 PM Sequoia Hall 141   


C O U R S E    D E S C R I P T I O N

This course will have 3 parts: 

     

Methods for analysis of biological sequences.  Topics include sequence scans for predefined signals, pairwise sequence alignments, multiple sequence alignments, and genome-wide alignment.  This is a more traditional subject in computational biology and the lectures will focus on the basic theoretical concepts.  However, there are also many exciting new developments, which  students will have a chance to explore by presenting a critique of a paper of their choosing.  A list of possible papers will be provided.

Statistical modeling of transcription regulation.   We will cover models for representation of transcription regulatory sites, and methods for detecting them from a combination of sequence, expression, and phylogeny data.  As for part 1, the lectures will focus on the statistical methodology.  Students will present a critique of a paper of their choosing from a provided list.
Analysis of whole genome profiling data.    We will look at statistical methods for the analysis of genome-wide profiling data, focusing on DNA-copy number data.  The lectures will explore common themes such as hidden Markov models, signal detection, segmentation, and cross-sample analysis.  

T A

Andres Villaquiran, office hours TBA

T E X T B O O K S

A good text book for the first part of the course is:

Ewens and Grant, Statistical Methods in Bioinformatics, 2001 Springer-Verlag, New York.

The materials for the second and third part of the course will be posted as needed on this website.

P R E R E Q U I S I T E S

The first four lectures (on sequence alignment and scans) are more theoretical and require a background in probability at the level of Statistics 217 and 218.  If you do not have this background, you may want to prepare by reading chapters 1, 2, 4, and 7 in Ewens and Grant.  The rest of the course require statistics at the level of Stat 200.  You also need to have background in genetics at the undergraduate level.  You are not required to be proficient in any specific programming language, however, I imagine you need to be proficient in some language in order to complete the final project.

L E C T U R E S

Date Subject Reading / Handouts
Tu 4/1 Scanning a single biological sequence for a signal.

Lecture Slides

The background on random processes are covered in EG Chapter 7.  Course material are taken from:  Brendel and Karlin (1992), Karlin and Altschul (1990).
Th 4/3 Pairwise sequence alignment. EG 6.4-6.5, 10.1 - 10.3.
Tu 4/8 Pairwise sequence alignments, phase transitions.

Lecture Slides

Course materials taken from: Altschul et al. (2001), Waterman and Vingron (1994), Chan (2003).
Th 4/10 Pairwise alignments, PAM matrices, ClustalW.  Thompson et al. ClustalW paper
Tu 4/15 Multiple sequence alignments, HMMs.

Lecture Slides

EG Chapter 12, HMM notes.

Hughey and Krogh (1995)

Th 4/17 Whole genome alignments and other miscellaneous topics

Lecture Slides

Siepel et al. (2005) Supplementary information.
Tu 4/22 Discussion of papers Reading list for part I.
Th 4/24 Motif analysis: Introduction to transcription regulation.

Lecture Slides

 
Tu 4/29 Motif analysis: Gibbs Sampler based approaches.

Lecture Slides

Liu et al. (1995)
Th 5/1 Motif analysis: Regression based approaches.

Lecture Slides

Zhang et al. (2008)
Tu 5/6    
Th 5/8 Motif analysis: Phylogenetic footprinting and other topics.

Lecture Slides

Zhou and Wong (2008)

Notes for Gibbs Samplers  in motif finding.

Tu 5/13 Paper Discussions Reading list for part II., papers are here.
Th 5/15 Genome wide profiling: Change-point methods  
Tu 5/20 Genome wide profiling: More on hidden Markov models  
Th 5/22 Genome wide profiling: Cross -sample analysis  
Tu 5/27 Genome wide profiling: Low-level normalization.  
Th 5/29 TBA  
Tu 6/3 Final presentations  
Th 6/5 Final Presentations  

G R A D I N G

2 paper reviews each 25%
Project 50%