Next: More about the theoretical
Up: Lectures
Previous: Balanced Bootstraps
Subsections
This is the method used for drawing a sample at random
from the empirical distribution.
I will start by giving a history and general
remarks about Monte Carlo methods for those who have
never studied them before.
What is a Monte Carlo Method?
There is not necessarily a random component in the
original problem that one wants to solve,usually
a problem for which there is no analytical solution.
An unknown parameter (deterministic) is expressed as a parameter
of some random distribution, that is then simulated.
The oldest well-known example is that of the stimation of
by Buffon, in his needle on the floorboards expeiment,
where supposing a needle of the same length as the width between
cracks we have:
In physics and statistics many of the problems Monte Carlo is
used on is under the form of
the estimate of an integral unkown in closed form:
- The crude, or mean-value Monte Carlo method thus
proposes to generate
numbers
uniformly from
and take their average:
to estimate
,
- The hit-or-miss Monte Carlo method
generates random points in a bounded
rectangle and
counts the number of 'hits' or points that are
in the region whose area we want to evaluate.
Which estimate is better?
This is similar to comparing statistical
estimators in general.
There are certain desirable properties that we want
estimators to have, consistency which ensures that
as the sample size increases the estimates converges to the right
answer is ensured here by the properties of
Riemann sums.
Other properties of interest are:
- Unbiasedness.
- Minimal Variance.
Both the above methods are unbiased, that is when repeated many times
their average values are centred in the actual value
.
So the choice between them lies in finding the one which has
the less variance.
The heuristic I developed in class to see that the
hit-and-miss has a higher variance is based
on the idea that the variance comes from the added randomness of
generating both coordinates at random, instead of just
the absissae in the crude Monte Carlo.
More precisely, the variance of crude Monte Carlo is
and that of hit and miss Monte Carlo, which is just a Binomial(n,) is:
The difference between these two variances is always
positive:
Most improvements to Monte Carlo methods are variance-reduction
techniques.
Suppose we have two random variables that provide estimators
for
,
and
, that they have the same variance
but that they are negatively correlated, then
will provide a
better estimate for
because it's variance will be smaller.
This the idea in antithetic resampling (see Hall, 1989).
Suppose we are interested with a real-valued parameter, and that we
have ordered our original sample
, for each
resample
and statistic
we associate
by taking a special permutation of the
's
that will make
, and as small as possible.
If the statistic is a smooth function of mean for instance, then the
'reversal permutation' that maps
into
,
into
, etc...
is the best, the small sample values are transformed into
the larger observations, and the average of these two
estimates will give an estimate with smaller variance.
This is often used in simulation, and is
a method to work around the small area problem.
If we want to evaluate tail probabilities, or
very small areas, we may have very few hits
of our random number generator in that area.
However we can
modify the random number generator, make that area
more likely as long as we take that into account we we do
the summation.
Importance sampling is based on the equalities:
Next: More about the theoretical
Up: Lectures
Previous: Balanced Bootstraps
Susan Holmes
2004-05-19