We want to estimate
and we can use
as an estimate either
or
. In fact there
is an intermediary choice, that takes the empirical cdf
and smooths it a little, then we
use the smoothed empirical cdf denoted by
and we plug it in.
This is especially useful when the
bootstrap distribution is too
discrete, mostly when
the statistic
is a quantile,
the median as we saw in the mouse data analysis
had that problem.
The crosses, which are the conditional averages are a smooth of the scatter plot is some way.
Now suppose that the x's could be all over the place, we window them and take local averages.
The extreme case is when you take the whole x axes, then there is only one average, if you want you draw a line through it.
When the window is the smallest there is NO smoothing.
Again we want something gentler so we reduce the window width,
and only take local averages.
If we choose to differentiate within a window the points that are
close to
the abscisse at which we want to estimate the
value
by averaging, we can use a kernel weighting function.
Points that are close are given high weights, points further away are given lighter weights, on the boundary of the window the points won't count.
The weighting function is such that the sum of all the weights is 1. With no difference between weights, they are uniform. In fact the weighting function can be a probability density and often we take a Normal one.
Here is a nice webpage on smoothing, with available matlab softare.
![]() |
![]() |
![]() |
![]() |
loess.m is available in the course directory &
loess is a built-in function in Splus.
Matlab procedure for bootstrapping the loess curve.
#N is the number of bootstrap.
N=500;
predmat=zeros(N, 101);
datasize=size(cholo,1);
clf;
plot(cholo(:,1), cholo(:,2), '.');
hold on;
for i=1:N
xind=unidrnd(datasize, datasize,1);
x=cholo(xind,:);
predmat(i,:)=loess(x(:,1), x(:,2), (0:100), .3, 1);
plot((0:100), predmat(i,:), '-.'); #Plot a sample bootstrap curve.
end;
#Plot the 95\% pointwise confidence lines.
plot((0:100), prctile(predmat, 2.5), 'r-');
plot((0:100), prctile(predmat, 97.5), 'r-');
xlabel('Compliance');
ylabel('Improvement');
axis([-5, 105, -40, 120]);
boott package:bootstrap R Documentation
Bootstrap-t Confidence Limits
Description:
See Efron and Tibshirani (1993) for details on this function.
Usage:
boott(x,theta, ..., sdfun=MISSING, nbootsd=25, nboott=200,
VS=FALSE, v.nbootg=100, v.nbootsd=25, v.nboott=200,
perc=c(.001,.01,.025,.05,.10,.50,.90,.95,.975,.99,.999))
Arguments:
x: a vector containing the data. Nonparametric bootstrap
sampling is used. To bootstrap from more complex data
structures (e.g. bivariate data) see the last example below.
theta: function to be bootstrapped. Takes 'x' as an argument, and
may take additional arguments (see below and last example).
...: any additional arguments to be passed to 'theta'
sdfun: optional name of function for computing standard deviation of
'theta' based on data 'x'. Should be of the form: 'sdmean <-
function(x,nbootsd,theta,...)' where 'nbootsd' is a dummy
argument that is not used. If 'theta' is the mean, for
example, 'sdmean <- function(x,nbootsd,theta,...)
{sqrt(var(x)/length(x))}' . If 'sdfun' is missing, then
'boott' uses an inner bootstrap loop to estimate the
standard deviation of 'theta(x)'
nbootsd: The number of bootstrap samples used to estimate the standard
deviation of 'theta(x)'
nboott: The number of bootstrap samples used to estimate the
distribution of the bootstrap T statistic. 200 is a bare
minimum and 1000 or more is needed for reliable alpha %
confidence points, alpha > .95 say. Total number of
bootstrap samples is 'nboott*nbootsd'.
VS: If 'TRUE', a variance stabilizing transformation is
estimated, and the interval is constructed on the
transformed scale, and then is mapped back to the original
theta scale. This can improve both the statistical
properties of the intervals and speed up the computation. See
the reference Tibshirani (1988) given below. If 'FALSE',
variance stabilization is not performed.
v.nbootg: The number of bootstrap samples used to estimate the variance
stabilizing transformation g. Only used if 'VS=TRUE'.
v.nbootsd: The number of bootstrap samples used to estimate the
standard deviation of 'theta(x)'. Only used if 'VS=TRUE'.
v.nboott: The number of bootstrap samples used to estimate the
distribution of the bootstrap T statistic. Only used if
'VS=TRUE'. Total number of bootstrap samples is
'v.nbootg*v.nbootsd + v.nboott'.
perc: Confidence points desired.
Value:
list with the following components:
confpoints: Estimated confidence points
theta, g: 'theta' and 'g' are only returned if 'VS=TRUE' was specified.
'(theta[i],g[i]), i=1,length(theta)' represents the
estimate of the variance stabilizing transformation 'g' at
the points 'theta[i]'.
References:
Tibshirani, R. (1988) "Variance stabilization and the bootstrap".
Biometrika (1988) vol 75 no 3 pages 433-44.
Hall, P. (1988) Theoretical comparison of bootstrap confidence
intervals. Ann. Statisi. 16, 1-50.
Efron, B. and Tibshirani, R. (1993) An Introduction to the
Bootstrap. Chapman and Hall, New York, London.
Examples:
# estimated confidence points for the mean
x <- rchisq(20,1)
theta <- function(x){mean(x)}
results <- boott(x,theta)
# estimated confidence points for the mean,
# using variance-stabilization bootstrap-T method
results <- boott(x,theta,VS=TRUE)
results$confpoints # gives confidence points
# plot the estimated var stabilizing transformation
plot(results$theta,results$g)
# use standard formula for stand dev of mean
# rather than an inner bootstrap loop
sdmean <- function(x, ...)
{sqrt(var(x)/length(x))}
results <- boott(x,theta,sdfun=sdmean)
# To bootstrap functions of more complex data structures,
# write theta so that its argument x
# is the set of observation numbers
# and simply pass as data to boot the vector 1,2,..n.
# For example, to bootstrap
# the correlation coefficient from a set of 15 data pairs:
xdata <- matrix(rnorm(30),ncol=2)
n <- 15
theta <- function(x, xdata){ cor(xdata[x,1],xdata[x,2]) }
results <- boott(1:n,theta, xdata)