This paper imitates algorithms from movie and book recommenders to find new genes related to a group of old genes. Given a query of genes with common function, we identify experiments in which the query genes are strongly co-expressed. Then we rank all the organisms genes according to the extent to which they agree with the query group, in the selected experiments. RNA interference knockouts confirmed two new Retinoblastoma related genes in C elegans.
There are many computationally efficient proposals for scrambling digital nets. Generally they preserve mean squared discrepancy. This paper shows that one alternative can be detrimental to the sampling variance, adversely affecting the rate of convergence. Another scrambling improves the rate of convergence, at least for d=1.
This paper explores the extent to which low superposition dimension is necessary for QMC to beat MC.
Quasi-regression is applied to the output of a support vector machine and to a neural network. The method allows one to peer into a black box and identify important variables and interactions. The most vexing issue is how to reconcile a decomposition derived for independent variables with a function fit to highly dependent data.
It is easy and natural to combine quasi-Monte Carlo with control variates. But the proper control variate coefficients can change, as can the choice of what constitutes a good control variate. In MC a good control variate correlates with the integrand. In QMC it is better to correlate with some derivative or high frequency component of the target integrand.
Quasi-regression is a method of Monte Carlo approximation useful for global sensitivity analysis. This paper presents a new version, incorporating shrinkage parameters of the type used in wavelet approximation. As an example application, a black box function from machine learning is analyzed. That function is nearly a superposition of functions of one and two variables and the first variable acting alone accounts for more than half of the variance.
A "dimension distribution" is introduced through which various measures of effective dimension of a function can be defined. The idea is explored on some widely used quadrature test functions. Some isotropic functions are shown to be of low effective dimension, explaining the success of QMC methods on them.
@article{dimdist, author = {A. B. Owen}, title = {The dimension distribution and quadrature test functions}, journal = {Statistica Sinica}, volume = 13, number = 1, note = {In press}, year = 2003 }
Here are the figures for the plaid paper. Figures 1 and 2 are LARGE. The compressed versions download faster. But when uncompressed they can take a long time to print or to come up in a postscript viewer. Figure 1 40.3 Mb PostScript Figure 1 4.11 Mb Zip compressed PostScript
Figure 2 33.4 Mb PostScript Figure 2 3.16 Mb Zip compressed PostScript
Figures 3 and 4 31.9 Kb PostScript
The Plaid program is available for noncommercial research purposes. Plaid only runs on the Wintel platform. To get a time limited copy or to learn how to license Plaid for commercial research purposes, visit this link .
Owen, A.B. "Plaid User's Guide" PostScript | PDF | HTML