next up previous index
Next: EM for exponential families Up: EM algorithm Previous: Matlab Implementation   Index

Estimating Mixture Proportions

Suppose the density is of the form:

\begin{displaymath}f(w,\psi)=\sum_{i=1}^{g} \pi_if_i(w)\end{displaymath}

where only the mixing proportions are unknown, the actual densities $f_i$ are supposed to be completely specified. The unknown parameter is $g-1$ dimensional: $\psi=(\pi_1,\pi_2,\ldots,\pi_{g-1})$ $y=(w_1,\ldots w_n)$ is observed from the mixture. The loglikelihood from the observed is:

\begin{displaymath}
log L(\psi)=\sum_{j=1}^{n}log f(w_j;\psi)=\sum_{j=1}^{n}log
\left(\sum_{i=1}^{g} \pi_if_i(w_j)\right)
\end{displaymath}

We differentiate $log L(\psi)$ with regards to $\pi_i$ and then we have to find the likelihood equations:

\begin{displaymath}
\sum_{j=1}^{n} \{\frac{f_i(w_j)}{f(w_j;\psi)}
-\frac{f_g(w_j)}{f(w_j;\psi)}\}=0
\end{displaymath}

We can't give a closed form solution. We introduce the dummy variables: $z=(z_1,\ldots z_n)$ , where each $z_i$ is a binary vector of length $g$ taking on the value 1 at the coordinate of the group it belongs to.

If we observed the $z$'s, the mle of $\pi_i$ would be $\hat{\pi}_i=\frac{z_{i.}}{n}$.

Take the new, complete data vector to be $x=(y,z)$.


\begin{displaymath}
\mbox{If } \;
log L_c(\psi)=\sum_g \sum_j z_{ij}log\pi+ \sum_g \sum_j z_{ij}log f_i(w_j)
\end{displaymath}

the second term on the right does not contain $\pi_i$, we ignore it.

The E-step:

\begin{eqnarray*}
E_{\psi^{(k)}}(Z_{ij}\vert y) &=&prob_{\psi^{(k)}}(Z_{ij}=1\ve...
...\
&=&\frac{\pi_i^{(k)}f_i(w_j)}{f(w_j;\psi^{(k)})}=z_{ij}^{(k)}
\end{eqnarray*}



The M-step:
Just act as if we knew the $z_{ij}$ to be $z_{ij}^{(k)}$, then we have the maximum of the likelihood at:

\begin{displaymath}\hat{\pi}_i=\frac{z_{i.}^{(k)}}{n}\end{displaymath}

Example with matlab:
next up previous index
Next: EM for exponential families Up: EM algorithm Previous: Matlab Implementation   Index
Susan Holmes 2002-01-12