next up previous
Next: References Up: Statistical Methods in genetics Previous: Practicalities

Subsections

Lectures

Lecture 1 : Presentation: 09/26/02

Reading for non biologists

US Dept Energy Primer DNA from the beginning

Reading for biologists

Maximum likelihood.
A chapter on maximum likelihood Class on math. stat

Monte Carlo:
Buffons Needle

Try the applet

Lecture 2: Introduction to Genetics 10/01/02

Genes, US Dept Energy Primer

Question about satellite DNA:
Tandemly Repeated (Satellite) DNA

Central Dogma: Duplication, transcription(RNA synthesis), translation(protein synthesis).

Building blocks

All ``alphabets'', can be confusing.

Atomic Level

C,N,P,O,H,.....: Carbon, Nitrogen, Phosphate, Oxygen, Hydrogen,....

Nucleotide -Level

Amino Acids

20 amino acids from the $4^3=64$ possibilities through the genetic code.

Proteins

Probabilistic Tools

Binomial Distribution

On/ off, hybridization/nonhybridized $\longrightarrow$ Bernouilli(p) variable.
Sum of many such events $(n)$ $\longrightarrow$ Binomial$(n,p)$ variable.

\begin{displaymath}P(X=k)= { n \choose k} p^k (1-p)^{n-k}\end{displaymath}

$n$ and $p$ are the parameters of the distributions, often $p$ unknown and we will need to estimate it, the estimator will be $\hat{p}$.

Multinomial Distribution


\begin{displaymath}
P(x_1,x_2,\ldots,x_m \vert p_1,\ldots,p_m)=\frac{n!}{\prod x...
...{{n}\choose{x_1,x_2, x_m}} p_1^{x_1}p_2^{x_2}
\cdots p_m^{x_m}
\end{displaymath}

Example:
Different codons can encode the same amino acid.

Mycobacterium tuberculosis H37Rv [gbbct]: 3873 CDS's (1321373 codons) 
------------------------------------------------------------------------
fields: [triplet] [frequency: per thousand] ([number]) 
------------------------------------------------------------------------


UUU  6.2(  8159)  UCU  2.2(  2933)  UAU  6.1(  8024)  UGU  2.2(  2917)
UUC 23.3( 30825)  UCC 11.5( 15256)  UAC 14.6( 19330)  UGC  6.6(  8745)
UUA  1.6(  2140)  UCA  3.5(  4685)  UAA  0.5(   609)  UGA  1.6(  2120)
UUG 17.9( 23684)  UCG 19.4( 25611)  UAG  0.9(  1144)  UGG 14.6( 19342)

CUU  5.4(  7188)  CCU  3.4(  4457)  CAU  6.4(  8488)  CGU  8.4( 11130)
CUC 17.3( 22811)  CCC 17.0( 22503)  CAC 15.8( 20910)  CGC 28.3( 37392)
CUA  4.8(  6278)  CCA  6.1(  8085)  CAA  8.1( 10700)  CGA  7.2(  9539)
CUG 50.5( 66756)  CCG 31.4( 41507)  CAG 22.8( 30081)  CGG 24.6( 32515)

AUU  6.5(  8551)  ACU  3.7(  4837)  AAU  5.3(  7003)  AGU  3.5(  4684)
AUC 33.9( 44767)  ACC 35.2( 46519)  AAC 19.9( 26348)  AGC 14.5( 19147)
AUA  2.2(  2893)  ACA  4.5(  5992)  AAA  5.3(  6944)  AGA  1.3(  1682)
AUG 18.4( 24348)  ACG 15.7( 20683)  AAG 15.1( 19889)  AGG  3.2(  4203)

GUU  8.0( 10578)  GCU 10.9( 14424)  GAU 15.8( 20815)  GGU 18.9( 24945)
GUC 32.7( 43214)  GCC 59.9( 79118)  GAC 42.2( 55775)  GGC 51.0( 67348)
GUA  4.7(  6274)  GCA 12.8( 16947)  GAA 16.2( 21373)  GGA  9.9( 13114)
GUG 40.1( 52998)  GCG 48.7( 64319)  GAG 30.5( 40322)  GGG 19.3( 25455)
Proline is $CC*$ (regular expression), there are 4 alternative spellings:
  Codon o/oo count $p_i$
  CCU 3.4 ( 4457) 0.059
  CCC 17.0 ( 22503) 0.294
  CCA 6.1 ( 8085) 0.106
  CCG 31.4 ( 41507) 0.542
Total   57.9 76552 1.001

Lecture 3: What is ML, what is MC? 10/03/02

Maximum likelihood.
A chapter on maximum likelihood Class on math. stat

Maximum Likelihood of Multinomial Cell Probabilities

$X_1,X_2,\ldots,X_m$ are counts in cells/ boxes 1 up to m, each box has a different probability (think of the boxes being bigger or smaller) and we fix the number of balls that fall to be $n$: $X_1+X_2+\cdots +X_m=n$. The probability of each box is $p_i$, with also a constraint: $p_1+p_2+\cdots +p_m=1$, this is a case in which the $X_i's$ are NOT independent, the joint probability of a vector $x_1,x_2,\ldots,x_m$ is called the multinomial ,and has the form:

\begin{displaymath}
f(x_1,x_2,\ldots,x_m\vert p_1,\ldots,p_m)=\frac{n!}{\prod x_...
...{{n}\choose{x_1,x_2, x_m}} p_1^{x_1}p_2^{x_2}
\cdots p_m^{x_m}
\end{displaymath}

Each box taken separately against all the other boxes is a binomial, this is an extension thereof.

We study the log-likelihood of this :

\begin{displaymath}
l(p_1,p_2,...,p_m)=log n!-\sum_{i=1}^m log x_i!+\sum_{i=1}^m x_ilog p_i
\end{displaymath}

However we can't just go ahead and maximise this we have to take the constraint into account so we have to use the Lagrange multipliers again. We use

\begin{displaymath}L( p_1,p_2,...,p_m,\lambda)= l(p_1,p_2,...,p_m)+\lambda(1-\sum_i^m p_i)
\end{displaymath}

By posing all the derivatives to be 0, we get the most natural estimate

\begin{displaymath}\hat{p_i}=\frac{x_i}{n}\end{displaymath}

Hardy-Weinberg

Remember this is a trinomial with three boxes: the probabilities are parametrized by:

\begin{displaymath}(1-\theta)^2 \qquad 2\theta (1- \theta) \qquad \theta^2\end{displaymath}


\begin{displaymath}l'(\theta)=-\frac{2X_1+X_2}{1-\theta}+ \frac{2X_3+X_2}{\theta}\end{displaymath}


\begin{displaymath}l''(\theta)=-\frac{2X_1+X_2}{(1-\theta)^2}+ \frac{2X_3+X_2}{\theta^2}\end{displaymath}

Each of the counts is binomially distributed with probabilities as described above so that:

\begin{eqnarray*}
E(X_1)&=n (1-\theta)^2\\
E(X_2)&=2n \theta (1-\theta)\\
E(X_3)&=n\theta^2\\
\end{eqnarray*}



R code for generating multinomials:
multin <-   
function(pvec, num)
{
#Function that generates the output of 
#putting num balls into length(pvec) boxes,
#with probabilities each given by the vector pvec
        run <- runif(num)
        gener <- cut(run, cumsum(c(0, pvec)))
        return(gener)
}

R code for estimating theta with ML estimate:

> theta<-
function(xvec)
{
        return((2 * xvec[3] + xvec[2])/(2 * sum(xvec)))
}

The Bayesian Paradigm

The Bayesian Paradigm can be seen in some ways as an extra step in the modelling world just as parametric modelling is. We have seen how we could use probabilistic models to infer about some unknown aspect either by confidence intervals or by hypothesis testing.

The motivation for any statistical analyses is that some ``target population'' is not well understood- some aspects of it are unknown or unsure.

The idea in this paradigm is to say thta any uncertainty can be modelled in a probabilistic way.

It is true that there are very rarely situations when one doesn't know anything at all, asked to measure the table, you won't want to use a ``pied de coulisse''(callipers) or a 100 yard measuring ribbon.

The probability model that we build can be quite approximate, it reflects one's beliefs and any prior experience we may have, it is described as personal or subjective.

Why ? Because it is different from person to person, examples that are easy to understand are about horse betting, the stock exchange...

So when the uncertainty about the model can be boiled down to a parameter $\theta$ the Bayesian statistician treats $\theta$ as if it were a random variable $\Theta$ whose distribution describes that uncertainty.

Elliciting a whole distribution may seem a challenge, in fact it's done by successive events of the type $\Theta \leq \theta$, and does NOT have to be very precise.

A subjective/personal probability is going to be subject to modification upon acquisition of further information supplied by experimental data.

Suppose a distribution with density $g(\mbox{${\theta}$ })$ describes one's present uncertainties about some probability model with density $f(x\vert\mbox{${\theta}$ })$.

Those uncertainties will change with the acquisition of data obtained by doing the experiment modelled by $f$.

Bayes theorem is essential in updating :

\begin{displaymath}P(H\vert data)=\frac{P(data\vert H)P(H)}{P(data)}\end{displaymath}

The probability of H given the data is called the posterior probability of H, it is posterior to the data. The unconditional probability of $H$ : $P(H)$ is the prior probability of H.

For given data $P(data\vert H)$ is the likelihood of H.

For given data we often write :

\begin{displaymath}P(H\vert data) \propto P(data\vert H) P(H)\end{displaymath}

The posterior is proportional to the likelihood time the prior.

If it helps (some people have a better understanding of odds):

\begin{displaymath}\frac{P(H\vert data)}{P(H^c\vert data)} \propto \frac{P(data\vert H)}{P(data\vert H^c)}
\frac{P(H)}{P(H^c)}\end{displaymath}

Posterior odds = Prior odds times likelihood ratio.
Now these formulae were written as if the rv were discrete for continuous random variables the behaviour is identical replacing probabilities with densities:

Represent the data by a random variable Y:

\begin{displaymath}h(\mbox{${\theta}$ }) \propto L(\mbox{${\theta}$ }) g(\mbox{${\theta}$ })\end{displaymath}

$L(\mbox{${\theta}$ }) $ proportional to the probability density of $Y$ given ${\theta}$ . In fact we can consider we are studying the joint distribution of two random variables $\Theta$ and $Y$.

The marginal distribution of Y is not exhibited, it is the proportionality factor. It can be written :

\begin{displaymath}m(y)=\int f(y\vert tth) g(\mbox{${\theta}$ }) d\mbox{${\theta}$ }\end{displaymath}

Remark: One does NOT have to worry too much about the prior because as soon as the data comes in it is `swamped' in the following sense: Two people with divergent prior opinions but reasonably open-minded will be forced into arbitrarily close agreement about future observations by a sufficient amount of data. We will see an example of this later on.

About Priors

``Gentle'' priors reflect some agreed-upon weakness in the available information, for instance before any instruments went to the moon no one had any precise idea about the answer to the question: ``How deep is the dust''. The initial belief was overthrown as soon as any data came back.

When advance information is available the Bayesian method provides a routine way for updating uncertainty when new information comes in.

There are several steps to building a prior:

Calibrating degrees of belief

Suppose I wanted to discover ``Your'' probability that average adult male emperor penguins weigh more than 50 lbs? We will go through comparison experiments:
  1. Would you rather bet on getting one green chip out of 1 R 1G or bet on A true?

    Suppose you prefer the latter.

  2. Would you rather bet on getting a green chip out of 3G and 1 R ?
....etc... This allows for statements that enable us to bound probabilites.

Another type of thought experiment could be used to build $P[\Theta leq \theta]$ for an increasing sequence of $\theta$'s.

This is not usually how priors are built though because it seems quite an exhaustive process to build up a whole density prior, instead we are going to use families of priors who have easy updating processes with regards to the specific likelihoods at hand.

Posterior odds = Prior odds \bgroup\color{cyan}$\times$\egroup likelihood ratio.


\begin{displaymath}h(\mbox{${\theta}$ }) \propto L(\mbox{${\theta}$ }) g(\mbox{${\theta}$ })\end{displaymath}

$L(\mbox{${\theta}$ }) $ proportional to the probability density of $Y$ given ${\theta}$ . In fact we can consider we are studying the joint distribution of two random variables $\Theta$ and $Y$.

The marginal distribution of Y is not exhibited, it is the proportionality factor. It can be written :

\begin{displaymath}m(y)=\int f(y\vert\mbox{${\theta}$ }) g(\mbox{${\theta}$ }) d\mbox{${\theta}$ }\end{displaymath}

Conjugate Priors

Sometimes a prior distribution can be approximated by one that is in a convenient family of distributions, which combines with the likelihood to produce a posterior that is manageable.

We see that an ``objective'' way of building priors for the binomial parameter was to use the `conjugate family' distribution that has the property that the updated distribution is in the same family.

Binomial-Beta

Beta priors for the Binomial parameter

A little history: From Bayes 1763: A white billiard ball is rolled along a line and we look at where it stops, scale the table from 0 to 1. We suppose that it has a uniform probability of falling anywhere on the line. It stops at a point p.

A red billiard ball is then rolled n times under the same uniform assumption. r then denotes the number of times R goes less far than W went. Given X what inference can we make about p ?

Taken another way, we could have rolled n white balls first and then the red balls and looked hat how many balls there were before the red ball.

Or we could have rolled all the balls together and looked at where the red ball was, it is the a'th with probability 1/(n+1).

Let's say this again in our terminology: We are looking for the posterior distribution of $p$ given X.

$p$ is a number between $0$ and $1$

The prior distribution of p is Uniform(0,1)=Beta(1,1).

Beta family


\begin{displaymath}f(p\vert r,s) \propto p^{r-1}(1-p)^{s-1}\end{displaymath}


\begin{displaymath}B(r,s)=\int_0^1 p^{r-1}(1-p)^{s-1}=\frac{\Gamma(r)\Gamma(s)}{\Gamma(r+s)}\end{displaymath}

$X\sim \B(n,p)$

\begin{displaymath}P(X=x\vert p)= {n \choose x} p^x(1-p)^{n-x} \end{displaymath}


\begin{displaymath}P(a<p<b and X=x)=\int_{a}^b {n \choose x} p^x(1-p)^{n-x}dp\end{displaymath}


\begin{displaymath}P(X=x)=\int_0^1 {n \choose x} p^x(1-p)^{n-x}dp\end{displaymath}


\begin{displaymath}P(a<p<b\vert X=x)=\frac{
\int_{a}^b {n \choose x} p^x(1-p)^{n-x}dp}{B(x+1,n-x+1)}\end{displaymath}

Normal-Normal

Some of you may have seen in section but I will mention it here as it is very important: The conjugate for a Normal likelihood is the Normal distribution; here is the theorem, I will not prove it, its simple algebra, and its in the book page 590.

For practical reasons, we define the precision as the inverse of the variance: we denote by $\xi=\frac{1}{\sigma^2}$ and $\xi_0=\frac{1}{\sigma_0^2}$
\begin{The}
Suppose that $\mu \sim \N(\mu_0,\sigma^2_0)$.
Then the posterior dis...
...math}and precision
\begin{displaymath}\xi_1=\xi_0+\xi
\end{displaymath}\end{The}
The posterior mean is a weighted average of the prior mean and the data, weights being proportional to the respective precisions.

With a very gentle prior we would have a very low precision $\xi_0$, a very flat prior and mostly the posterior is Normal with x as its mean.

Of course what we are usually interested in is the posterior given an iid sample of size $n$, what you could expect happens it is equivalent to adding one observation $\bar{x}$ from a distribution that has variance $\sigma^2/n$.

Multinomial-Dirichlet

You are given a set $\mbox{${\cal X}$ }$ (here taken as finite) and a probability density $p(x), (p(x)\geq 0, \sum p(x)=1)$. Also given is a set $A$ in $\mbox{${\cal X}$ }$. The problem is to compute or approximate $p(A)$.

In order to go further we need to extend what we did before for the binomial and its Conjugate Prior to the multinomial and the the Dirichlet Prior. This is a probability distribution on the $n$ simplex

\begin{displaymath}\Delta_n=\{{\tilde{p} }=(p_1,\cdots,p_n),\; p_1+\cdots
+p_n=1,\; p_i\geq 0\;\} \end{displaymath}

It is a $n$-dimensional version of the beta density. The Dirichlet has a parameter vector: $\tilde{a} =(a_1,\ldots,a_n)$. Throughout we write $A=a_1+\cdots+a_n$.

$\Delta_n$ is normalised to have total mass 1 the Dirichlet has density:

\begin{displaymath}
D_{\tilde{a}}(\tilde{x})=\frac{\Gamma(A)}{\prod \Gamma (a_i)} x_1^{a_1-1}
x_2^{a_2-1} \cdots
x_n^{a_n-1}
\end{displaymath}

The uniform distribution on $\Delta_n$ results from choosing all $a_i=1$. The multinomial distribution corresponding to $k$ balls dropped into $n$ boxes with fixed probability $(p_1,\cdots,p_n)$ (with the ith box containing $k_i$ balls) is

\begin{displaymath}{k \choose {k_1 \ldots k_n}} p_1^{k_1} \cdots p_n^{k_n}
\end{displaymath}

If this is averaged with respect to $D_{\tilde{a}}$ one gets the marginal (or Dirichlet/ Multinomial):

\begin{displaymath}
P (k_1,\ldots,k_n) =
\frac{(a_1)_{(k_1)} (a_2)_{(k_2)} \ldots (a_n)_{(k_n)}} {A_{(k)}}\end{displaymath}


\begin{displaymath}
\mbox{ where }\;\; m_{(j)} \stackrel{\mbox{def}}{=} m(m+1)\cdots (m+(j-1))\end{displaymath}

From a more practical point of view there are two simple procedures worth recalling here:

The Dirichlet is a convenient prior because the posterior for $\tilde{p}$ having observed $(k_1,\cdots,k_n)$ is Dirichlet with probability $(a_1+k_1,\cdots,a_n+k_n)$. An important characterization of the Dirichlet: it is the only prior that predicts outcomes linearly in the past. One frequently used speical case is the symmetric Dirichlet when all $a_i=c >0 $. We denote this prior as $D_c$.


What is a Monte Carlo Method?

Buffons Needle

Example applet

A course on the Bootstrap

There is not necessarily a random component in the original problem that one wants to solve,usually a problem for which there is no analytical solution. An unknown parameter (deterministic) is expressed as a parameter of some random distribution, that is then simulated. The oldest well-known example is that of the stimation of $\pi$ by Buffon, in his needle on the floorboards expeiment, where supposing a needle of the same length as the width between cracks we have:

\begin{displaymath}p(needle\; crosses\; crack)=\frac{2}{\pi} \mbox{ implying }
\hat{\pi}=2\frac{\char93 tries}{\char93  hits}
\end{displaymath}

In physics and statistics many of the problems Monte Carlo is used on is under the form of the estimate of an integral unkown in closed form:

\begin{displaymath}\theta=\int_0^1 f(u)du
\mbox{ which can be seen as the evaluation of }
Ef(u),\mbox{ where }u \sim U(0,1)\end{displaymath}

  1. The crude, or mean-value Monte Carlo method thus proposes to generate $B$ numbers uniformly from $(0,1)$ and take their average: to estimate $\theta$,

    \begin{displaymath}\hat{\theta}=\frac{1}{B}\sum_{b=1}^B
f(u_b)\end{displaymath}

  2. The hit-or-miss Monte Carlo method generates random points in a bounded rectangle and counts the number of 'hits' or points that are in the region whose area we want to evaluate.

    \begin{displaymath}\hat{\theta}=\frac{\char93  hits}{\char93  total}\end{displaymath}

Which estimate is better?
This is similar to comparing statistical estimators in general.

There are certain desirable properties that we want estimators to have, consistency which ensures that as the sample size increases the estimates converges to the right answer is ensured here by the properties of Riemann sums. Other properties of interest are:

Both the above methods are unbiased, that is when repeated many times their average values are centred in the actual value $\theta$.

\begin{displaymath}E(\hat{\theta})=\theta\end{displaymath}

So the choice between them lies in finding the one which has the less variance. The heuristic I developed in class to see that the hit-and-miss has a higher variance is based on the idea that the variance comes from the added randomness of generating both coordinates at random, instead of just the absissae in the crude Monte Carlo.

More precisely, the variance of crude Monte Carlo is

\begin{displaymath}\sigma_M^2=\frac{1}{n}\int_0^1(f(u)-\theta)^2du=\frac{E(f(u)-\theta)^2}{n}=\frac{1}{n}E(f(u)^2)
-\frac{\theta^2}{n}
\end{displaymath}

and that of hit and miss Monte Carlo, which is just a Binomial$(n,\theta)$ is:

\begin{displaymath}\sigma_H^2=\frac{\theta(1-\theta)}{n}
\end{displaymath}

The difference between these two variances is always positive:

\begin{displaymath}\sigma_M^2-\sigma_H^2=
\frac{1}{n}\int_0^1 f(u)(1-f(u))du >0
\end{displaymath}

Most improvements to Monte Carlo methods are variance-reduction techniques.

Lecture 4: Markov Chains and Hidden Markov Models

Simple description of a Markov chain

Chapter on Stochastic Simulation

Simple lesson on Markov chains

CpG islands, chapter 3 in DEKM.
Course notes: Shamir's Course notes

CpG Island Applet Finder CpG Island Applet Finder

APPLICATION:imprinting.
imprinting

Lecture 5: Hidden Markov Models, Viterbi, Forward and Backward Algorithms

I showed an animation in class from: Good Applets and Didactic Material on all 3 algorithms

Lecture 6: HMM for Protein Families

Homology

Underlying model is a graph with `Match', `delete' and `insert' states.

Regions of higher conservation are called functional domains, their resistance to change indicates they serve some critical function.

For those who already know about subsititution matrics (we will study next), the HMM model has more flexibility because it allows the transitions/transversion probabilities to change from position to position.

Downside: more parameters, have to have much more data. Training is the first step, with training sequences where the domains are already known.
Complete Tutorial by Rachel Karchin

Extensions:
Krogh's tutorial

HMM profiles:
Hmmer
Pfam

Lecture 7: More Bayesian Computations

More about Bayesian statistics:
Harvey Thornberg class Bayesian resources

Dirichlet mixtures

Prediction of transmembrane helices

Two groups of amino acids whose differing frequencies give information on the protein's location are hydrophobic and hydrophilic.
Tutorial on Amino Acids

Tutorial on Amino Acids

Lecture 7b: Semi hidden Markov Models: Genscan

Slides by the inventor: Chris Burge, MIT

Lecture 8: Gibbs sampling for motif recognition

Jun Liu's overview of sequence analysis

Jun Liu's Gibbs sampling talk

The product multinomial model is

\begin{displaymath}
p(R\vert\theta_0,\Theta,A)=\theta_0^h(R_{\{A\}^c}) \prod_{j=...
..._{A}) \prod_{j=1}^w (\frac{\theta_j}{\theta_0})^{h(R_{A+j-1})}
\end{displaymath}

and the probability that a given $k$th motif starts at $l$ is

\begin{displaymath}
\pi(a_k=l \vert \Theta,\theta_0, R_k) \propto \prod_{j=1}^w (\frac{\theta_j}{\theta_0})^{h(r_{k,l+j-1})}
\end{displaymath}

where $h$ is the function that counts the number of amino acids (or nucleotides) in that sequence.

Lecture 9: Molecular Evolution and Continuous Time Markov chains

Continuous time Markov chains

Memoryless Property
$P(Y(u+t)=j\vert Y(t)=i)$ doesn't depend on time before t
Time homogeneity
$P(Y(h+t)=j\vert Y(t)=i)$ doesn't depend on t, only depends on h, time between the events.
Instantaneaous transition rate

\begin{displaymath}P_{ij}(h)=q_{ij}h+o(h), j\neq i.\end{displaymath}


\begin{displaymath}P_{ii}(h)=1-q_i(h)+o(h), \qquad q_i=\sum_{j\neq i}q_{ij}\end{displaymath}

$q_{ij}$ is known as the instantaneous transition rate.
Times between changes are exponential

\begin{eqnarray*}
P(T\geq t+h)&=&P(T\geq t) P(T\geq t+h \vert T\geq t)\\
P(T\ge...
...leq t)&=&1-e^{-q_i t}\\
f(t)&=& q_i e^{-q_i t} \sim Exp(q_i)\\
\end{eqnarray*}



Derivative of $P$

\begin{displaymath}\frac{P_{ij}(t+h)-P_{ij}(t)}{h}=-q_jP_{ij}(t)+\sum_{k\neq
j} q_{kj}P_{ik}(t)\end{displaymath}

as $h \longrightarrow 0$,

\begin{displaymath}\frac{dP_{ij}(t)}{dt}=-q_jP_{ij}(t)+\sum_{k\neq
j} q_{kj}P_{ik}(t)\end{displaymath}

Particular case of Jukes-Cantor: $q_j=3\alpha$ and $q_{ij}=\alpha, i\neq j$.

\begin{eqnarray*}
\frac{dP_{ij}(t)}{dt}&=&-3\alpha P_{ij}(t)+\alpha\sum_{k\neq
...
...alpha t}\\
P_{ij}(t)&=&\frac{1}{4}-\frac{1}{4}e^{-4\alpha t}\\
\end{eqnarray*}



A pdf version of the above maths

Continuous-time Markov Chains

More Continuous-time Markov Chains

Markovian models in Phylogeny

The HIV/SIV jump problem, when did it occur?

Quest for the Origin of AIDS

Did Modern Medicine Spread an Epidemic?

The river without a paddle

HIV-1 and Polio?

Lecture 10: Evolutioniary trees: nucleotide level

Likelihood Estimation for Trees

M. Singh - On Phylogeny(psfile)

Systematics and Phylogenetic Inference

An algorithmic approach to the ML tree

Lecture 11: Evolutioniary trees: software and examples

Amino Acid Evolution: Codon substitution models, tests and consequences:
Ziheng Yang -psb 00- physicochemical properties aas

Yang's Woods Hole tutorial

Modelling evolution at the Amino Acid Level- R. Goldstein

Software for Phylogeny:

Phylip

PAML software

Lecture 12a: Phylogenetic Trees: software and examples

MrBayes: output from run.

Lecture 12b: Extreme Value Theory

Special example, distribution of the maximum of N $Poisson(\lambda)$ random variables. An application to antigen determinants.

Lecture 12c: Multivariate Statistical Methods: a taxonomy

Table of Methods for Studying Links between Variables:
Techniques Variables to explain Explanatory Var.
Multiple Regression 1 continuous p continuous
Analysis of Variance 1 continuous p categorical
Analysis of Covariance 1 continuous p1 continuous
p2 categorical
Correspondence Analaysis 1 categorical 1 categorical
Canonical Correlation Analysis q continuous p continuous
Principal Components with q continuous p continuous
resepect to Instrumental Variables
Discriminant Analysis 1 categorical p continuous
Multidimensional Analysis p continuous p categorical
of Variance
Multidimensional Analysis p continuous p1 categorical
of Covariance p2 continuous
Regression Tree 1 continuous p1 continuous
p2 categorical
Classification Tree 1 categorical p1 continuous
p2 categorical
p2 categoricals

Table of Methods for Representing Data:
Techniques Variables
Principal Components p continuous
Multiple Correspondence Analysis p categorical or
p categorical and q continuous
Multidimensionnal Scaling categoricals and continuous
Clustering (either hierarchical or not) categorical

Reading for next week, about micro-arrays: Anatomy of a Comparative Gene Expression Study

About Gene Expression read: Terry Speeds' lecturehttp://www.stat.Berkeley.EDU/users/terry/Classes/s246.2002/Week3/week3ab.pdf

Flash animation of DNA Microarray Methodology

Gene Expression Analysis and Genetic Network Modeling

Templates for looking at Gene expression

Lecture 14:

Normalization Problems.

http://www.stat.Berkeley.EDU/users/terry/Classes/s246.2002/Week7/week7b.pdf

Analysis of Variance for Microarrays. MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments Discriminant Analysis


next up previous
Next: References Up: Statistical Methods in genetics Previous: Practicalities
Susan Holmes 2002-11-05