RuleFit with R
(8/10/05)
Model building : rulefit
Model manipulation : getmodel, rfrestore, rfannotate, rfmodinfo
Cross-validation : rfxval
Prediction: rfpred
Variable importance: varimp
Interaction effects : interact, twovarint, threevarint, intnull, rfnullinfo
Display rules : rules
Partial dependence plots : singleplot, pairplot
Other : rfversion
Open R
AT the R command prompt enter:
> platform = "PLATFORM"
> rfhome = "RFHOME"
> source("RFHOME/rulefit.r")
> install.packages("akima", lib=rfhome)
> library(akima, lib.loc=rfhome)
Here "PLATFORM" is either the text string "windows" or "linux" depending on the running operating system, and RFHOME is a text string indicating the full path name (using forward slashes / ) of the directory where rulefit.r and rf_go.exe are stored. This will be the RuleFit home directory. (examples: rfhome = "/R_RuleFit"; rfhome = "/home/jhf/R_RuleFit")
Notes: the computer must be connected to the internet to execute the install.packages command. Only the last command is needed every time R is entered. The others need only be entered the first time provided that on exit from R the "yes" option is selected at the "Save workspace image" prompt.
RuleFit implements the model building and interpretational tools described in Predictive Learning via Rule Ensembles (FP 2004). Some familiarity with this paper is recommended. The documentation refers to sections in the paper describing the various options in detail. Also, some familiarity with the paper Gradient Directed Regularization (FP 2003) may be helpful.
The R/RuleFit interface consists of the R procedures described below. The principal procedure is rulefit. It builds the RuleFit model given the input data and various procedure parameters. This model is stored in the RuleFit home directory RFHOME and invisibly returned as a a RuleFit model object (list) to R. All other RuleFit procedures reference the current model and its input data as stored in the RuleFit home directory. Every time the procedure rulefit is invoked the resulting RuleFit model overwrites the previously stored model and its input data, thereby replacing it as the current model.
Previously constructed (and saved) models and their input data can be replaced in the RuleFit home directory for analysis at a later time using the procedure rfrestore, thereby overwriting the current model and its data. This replaced model and data then become the current ones in the RuleFit directory and all RuleFit procedures (other than rulefit ) will reference them until either a different previously constructed model and its data are placed in the directory, or the rulefit procedure is subsequently invoked. At any time, the current model in the RuleFit home directory can be obtained (and saved) as a named R object using the procedure getmodel. The properties of any RuleFit model object can be viewed using the procedure rfmodinfo.
With any RuleFit procedure input predictor variables can be referenced either by their respective column numbers in the input data matrix or data frame, or by their corresponding character column variable names, if present. Character variable names can be associated with columns using the colnames feature in R or by providing them as part of an input data frame. If variable names are specified then all output will reference those names. If not, the column numbers will be used to reference the input variables.
Usage:
rfmod = rulefit (x, y, wt=rep(1,nrow(x)), cat.vars=NULL, not.used=NULL, xmiss=9.0e30, rfmode="regress", model.type="both", trim.qntl=0.025, huber=0.8, max.rules=2000, tree.size=4, inter.supp=3, memory.par=0.01, samp.fract=min(0.5,(100+6*sqrt(neff))/neff), path.xval=3, path.speed=2, path.steps=50000, path.testfreq=100, path.inc=0.01, conv.fac=1.1, tree.store=10000000, cat.store=1000000)
Required arguments:
x = input predictor data matrix or data frame. Rows are observations and columns are variables. Must be a numeric matrix or a data frame.
y = input response values. For
classification (rfmode="class", see below) values must
only be +1 or -1.
If y is a single valued scalar it is interpreted as a label (number or name)
referencing a column of x. Otherwise it is a vector of length nrow(x) containing
the numeric response values.
Optional arguments:
wt =
observation weights.
If wt is a single valued scalar it is interpreted as a label (number or name)
referencing a column of x. Otherwise it is a vector of length nrow(x) containing
the numeric observation weights.
cat.vars = vector of column labels (numbers or names) indicating categorical variables (factors). All variables not so indicated are assumed to be orderable numeric. If x is a data frame and cat.vars is missing, then components of type factor are treated as categorical variables. Ordered factors should be input as type numeric with appropriate numerical scores. If cat.vars is present it will override the data frame typing.
not.used = vector of column labels (numbers or names) indicating predictor variables not to be used in the model.
xmiss = predictor variable missing value flag. Must be numeric and larger than any non missing predictor variable value. Predictor variable values greater than or equal to xmiss are regarded as missing. Predictor variable data values of NA are internally set to the value of xmiss and thereby regarded as missing.
rfmode =
regression /classification flag
rfmode="regress" => regression. The
outcome or response variable is regarded as numerically valued and the model
is used to predict the value of the response.
rfmode="class"
=> binary classification. The model produces a numeric score whose
absolute value reflects confidence that its sign is the same as that of the
response. The sign of this score can be used for prediction.
model.type = rule generation
flag for numeric variables
model.type = "linear" => only linear model
for orderable numeric variables (no rules). Generate rules only for unorderable
categorical variables (factors), if any. See
cat.vars
model.type = "rules" => use only generated
rules to fit model (no linear variables)
model.type = "both" => use both
to fit model
trim.qntl = linear variable conditioning factor. Ignored for model.type = "rules" . FP 2004 (Sec. 5)
huber = trimming factor for Huber loss criterion. Fraction of observations subject to squared-error loss. huber >/= 1.0 => squared-error loss. Used for regression (rfmode = "regress") only. Ignored for rfmode = "class". FP 2004 (Sec. 3.4)
max.rules = approximate total number of rules generated for fitting. Note: with missing values, the actual number of rules generated may be considerably larger than max.rules.
tree.size = average number of terminal nodes in generated trees. FP 2004 (Sec. 3.3)
inter.supp = incentive factor for using fewer variables in tree based rules. FP 2004 (Sec. 8.2)
memory.par = scale multiplier (shrinkage factor) applied to each new tree when sequentially induced. FP 2004 (Sec. 2)
samp.fract = fraction of randomly chosen training observations used to produce each tree. FP 2004 (Sec. 2) Note: this quantity refers to the fraction of "effective" observations (neff) given by neff = sum(wt)^2/sum(wt^2).
path.xval = number of cross-validation iterations (folds) used to estimate lasso regularization parameter. FP 2003
path.speed = execution
speed / fitting thoroughness trade-off
path.speed = 1 => full
path.xval - fold cross-validation to estimate lasso
regularization parameter (recommended for data sets with a small number of
observations)
path.speed = 2 => one - fold
cross-validation to estimate lasso regularization parameter with test set
size = nrow(x)/path.xval. Use full data set to derive
final model (default)
path.speed = 3
=> one - fold cross-validation to estimate lasso regularization
parameter with test set size = nrow(x)/path.xval.
Use learning data subset to derive final model (recommended for data sets
with a very large number of observations or for early exploration)
path.steps = maximum number iterations of gradient directed stepping to estimate RuleFit lasso model. FP 2003
path.testfreq = frequency for recomputing gradient, and checking test risk in gradient directed stepping to estimate RuleFit lasso model. FP 2003 .
path.inc = maximum factor (step size) scaling gradient for incrementing selected coefficients at each step in gradient directed stepping to estimate RuleFit lasso model. FP 2003
conv.fac = convergence factor for early stopping in gradient directed stepping to estimate RuleFit lasso model. Iterations stop when error > fac*(minimum error so far). FP 2003 .
tree.store = size of internal tree storage. Decrease value in response to memory allocation error. Increase value for very large values of max.rules and/or tree.size, or in response to diagnostic message or erratic program behaivor.
cat.store = size of internal categorical value storage. Decrease value in response to memory allocation error. Increase value for very large values of max.rules and/or tree.size in the presence of many categorical variables (factors) with many levels, or in response to diagnostic message or erratic program behaivor.
Output:
rfmod = RuleFit model object representing the model placed in the RuleFit home directory. Can be replaced at a later time using rfrestore.
Printed output at the command line giving lasso cross-validated error estimate, number of terms in the resulting model, and the number of lasso steps.
Examples:
rulefit(x, y); rfxyw = rulefit(x, y, w);
rfxycls = rulefit(x, 14, 33, cat.vars=c(2,4,5,7,9), not.used=c(1,3),
rfmode="class")
rfbosthouse=rulefit(bostdat,"MEDV", cat.vars="CHAS", huber=0.9,
path.steps=100000)
Comments: If the reported number of lasso steps is very close to the value of path.steps, model accuracy may be increased by enlarging the value of either path.steps or path.inc. If the reported number of lasso steps is very small (< 1500) accuracy may be increased by reducing the value of path.inc. Note that the lasso error estimate may be slightly biased low for small data sets, since the rule predictors are constructed from the training data. See rfxval for full cross-validation of the RuleFit model.
References:
Friedman,
J. H. and Popescu, B. E. (2003). Gradient directed
regularization.
Friedman, J. H. and Popescu, B. E. (2004). Predictive learning via rule
ensembles.
getmodel: retrieve current model from RuleFit home directory
Usage:
rfmod = getmodel ()
Arguments: none
Output:
rfmod = RuleFit model object representing the model currently stored in the RuleFit home directory. Can be replaced in the home directory at a later time using rfrestore.
rfrestore: replace (change) the current model in RuleFit home directory
Usage:
rfrestore (model, x=NULL, y=NULL, wt=rep(1,nrow(x)))
Required argument:
model = RuleFit model object output from rulefit or getmodel
Optional arguments:
x = input predictor data matrix or data frame used to construct model.
y = input response values used to
construct model.
If y is a single valued scalar it is interpreted as a label (number
or name) referencing a column of x. Otherwise it is a vector of length nrow(x)
containing the numeric response values.
wt =
observation weights used to construct model.
If wt is a single valued scalar it is interpreted as a label (number or name)
referencing a column of x. Otherwise it is a vector of length nrow(x) containing
the numeric observation weights.
Output: none
Examples: rfrestore (rfmod); rfrestore (rfbosthouse, bostdat, "MEDV")
Comment: each of the optional arguments need only be included if the corresponding quantities in the RuleFit home directory have changed since the model was created. That is if rulefit or rfrestore were invoked with different data after model was created.
rfannotate: add text description to RuleFit model object
Usage:
rfmod = rfannotate (rfmod, "text")
Required arguments:
rfmod = RuleFit model object output from rulefit or getmodel
text = character string
Output:
rfmod = same RuleFit model object as input model
Examples:
rfbosthouse = rfannotate(rfbosthouse, "This is a RuleFit model for Boston
housing data")
rfxycls = rfannotate (rfxycls, 'rulefit(x, 14, 33, cat.vars=c(2,4,5,7,9),
not.used=c(1,3), rfmode="class")')
rfmodinfo: view the properties of a RuleFit model object
Usage:
rfmodinfo (model)
Required argument:
model = RuleFit model object
Output: none
Examples: rfmodinfo (rfbosthouse); rfmodinfo (getmodel ())
Comment: prints at the command line the date and time the model was created as well as all parameter values used to construct the model.
rfxval: full cross-validation of RuleFit model
Usage:
xval = rfxval (nfold=10, quiet=F)
Optional arguments:
nfold = number of folds (>/= 2)
quiet = output flag
quiet = TRUE / FALSE => do/don't minimize command window and print
output at command line.
Output: list
Regression:
xval$aae = average-absolute prediction error
xval$rms = root-mean-squared prediction error
Classification:
xval$omAUC = 1 - area under ROC curve
xval$errave = average error rate
xval$errpos = positive (y = +1) error rate
xval$errneg = negative (y = -1) error rate
Examples: rfxval (); xval= rfxval(20, T)
Comment: Uses current model in the RuleFit home directory. All errors are computed using the observation weights.
rfpred: predict using the RuleFit model
Usage:
yp = rfpred (xp)
Required argument:
xp = values of the input variables for the observation(s) to be predicted. Must be a data frame if a data frame was used to construct the model in the RuleFit home directory. Otherwise it must be a numeric vector or matrix.
Output:
yp = vector of length nrow(xp) containing the output predictions for each of the observations.
Regression: yp is used to predict the response value(s).
Classification: yp is a numeric score whose absolute value reflects confidence that its sign is the same as that of the response. The sign of this score can be used for prediction.
Example: yp = rfpred (xp)
Comment: Uses current model in the RuleFit home directory.
varimp: RuleFit model input variable importances
Usage:
vi = varimp (range=NULL, impord=T, x=NULL, wt=rep(1,nrow(x)), rth=0, plot=T, col='grey', donames=T, las=2)
Optional arguments:
range = indicies of the range of variables to be plotted. If there are 100 input variables, then range=1:20 would plot the importances of the first 20 variables, and range=81:100 would plot the importances of the last 20. The default plots the first 30 variables.
impord = flag specifying order of listing and plotting
variable importances.
impord = TRUE => list and display in
order of descending variable importance.
impord = FALSE => list and display in
data matrix column order.
x = subset of observations over which importances are to be computed. Must be a data frame if a data frame was used to construct the model in the RuleFit home directory. Otherwise it must be a numeric vector or matrix. If missing then all training observations are used. FP 2004 (Sec. 7)
wt = weights for observations stored in x.
If wt is a single valued scalar it is interpreted as a label (number or name)
referencing a column of x. Otherwise it is a vector of length nrow(x) containing
the numeric observation weights.
rth = rule importance threshold. Variable importances are computed only using those rules whose importances are greater than rth * (largest rule importance)
plot = plotting flag.
plot = TRUE / FALSE => do/don't display barplot
col = color of barplot
donames = barplot variable label flag
donames = TRUE / FALSE => do/don't display
variable labels on barplot
las = label orientation flag
las =1 => horizontal orientation of variable
labels
las =2 => vertical orientation of variable
labels
Output: list
vi$imp = vector of importances for all variables.
vi$ord = vector of data matrix column numbers corresponding to the elements of vi$imp. vi$imp[k] is the importance of variable (column number) vi$ord[k].
Examples: varimp (); varimp (31:40, impord = F, x=xhigh); vi = varimp(plot = F)
Comment: Uses current model in the RuleFit home directory.
interact: overall strengths of interaction effects for selected variables
Usage:
int = interact (vars, null.mods=NULL, nval=100, plot=T, las=2, col=c("red","yellow"), ymax=NULL)
Required argument:
vars = vector of variable identifiers (column names or numbers) specifying selected variables to be considered.
Optional arguments:
null.mods = RuleFit null-model object returned from procedure intnull. FP 2004 (Sec. 8.3)
nval = number of evaluation points used for calculation (larger values provide higher accuracy with a diminishing return; computation grows as nval^2)
plot = plotting flag.
plot = TRUE / FALSE => do/don't display barplot
las = label orientation flag
las =1 => horizontal orientation of variable
labels
las =2 => vertical orientation of variable
labels
col = foreground and background barplot colors. If null.mods is missing then interaction strengths are plotted using col[2]. If null.mods is specified then the null standard deviations are plotted in col[1] and the difference between the interaction strengths and their expected null values are plotted in col[2]. Note that the col[1] bars are plotted over the col[2] bars, so that the absence of a col[2] bar indicates that the corresponding interaction strength is less than one standard deviation above its expected null value.
ymax = specified vertical scale upper limit for barplot. If missing then maximum plotted interaction strength value is used.
Output:
If null.mods is missing:
int = vector of interaction strengths: int[k] is the interaction strength of input variable vars[k]
If null.mods is specified: (list)
int$int = vector of interaction strengths: int$int[k] is the interaction strength of input variable vars[k]
int$nullave = vector of expected null interaction strengths: int$nullave[k] is the expected null interaction strength of variable vars[k]
int$nullstd = vector of null standard deviations: int$nullstd[k] is the standard deviation of the null interaction strength of variable vars[k]
Examples:
interact (1:10); interact (vi$ord(1:10), null.mods)
int = interact(c("RM", "NOX", "PTRATIO", "LSTAT"), null.bost, ymax=0.4)
Comment: Uses current model in the RuleFit home directory. See FP 2004 (Sec. 9) for illustrations.
twovarint: two-variable interaction strengths of a target variable with selected other variables
Usage:
int2var = twovarint (tvar, vars, null.mods=NULL, nval=100, import=F, plot=T, las=2, col=c("red","yellow"), ymax=NULL)
Required arguments:
tvar = variable identifier (column name or number) specifying the target variable.
vars = vector of variable identifiers (column names or numbers) specifying other selected variables. Must not contain tvar.
Optional arguments:
null.mods = RuleFit null-model object returned from procedure intnull. FP 2004 (Sec. 8.3)
nval = number of evaluation points used for calculation (larger values provide higher accuracy with a diminishing return; computation grows as nval^2)
import = interaction importance flag
import= TRUE / FALSE => do/don't scale interaction
strengths according to their importance to the model. FP
2004 (Sec. 8.1)
plot = plotting flag.
plot = TRUE / FALSE => do/don't display barplot
las = label orientation flag
las =1 => horizontal orientation of variable
labels
las =2 => vertical orientation of variable
labels
col = foreground and background barplot colors. If null.mods is missing then interaction strengths are plotted using col[2]. If null.mods is specified then the null standard deviations are plotted in col[1] and the difference between the interaction strengths and their expected null values are plotted in col[2]. Note that the col[1] bars are plotted over the col[2] bars, so that the absence of a col[2] bar indicates that the corresponding interaction strength is less than one standard deviation above its expected null value.
ymax = specified vertical scale upper limit for barplot. If missing then maximum plotted interaction strength value is used.
Output:
If null.mods is missing:
int2var = vector of interaction strengths: int2var[k] is the two-variable interaction strength of tvar with input variable vars[k]
If null.mods is specified: (list)
int2var$int = vector of interaction strengths: int2var$int[k] is the interaction strength of tvar with input variable vars[k]
int2var$nullave = vector of expected null interaction strengths: int2var$nullave[k] is the expected null interaction strength of tvar with variable vars[k]
int2var$nullstd = vector of null standard deviations: int2var$nullstd[k] is the standard deviation of the null interaction strength of tvar with variable vars[k]
Examples:
twovarint (6, c(1:5,7:13)); int2var = twovarint ("Var 1", c("Var 2",
"Var 3"), null.mods)
int2var= twovarint ("PTRATIO", c("RM", "NOX", "LSTAT"), null.bost, ymax=0.3)
Comment: Uses current model in the RuleFit home directory. See FP 2004 (Sec. 9) for illustrations.
threevarint: three-variable interaction strengths of two target variables and selected other variables
Usage:
int3var = threevarint (tvar1, tvar2, vars, null.mods=NULL, nval=100, import=F, plot=T, las=2, col=c("red","yellow"), ymax=NULL)
Required arguments:
tvar1 = variable identifier (column name or number) specifying the first target variable.
tvar2 = variable identifier (column name or number) specifying the second target variable. Must be different that tvar1.
vars = vector of variable identifiers (column names or numbers) specifying other selected variables. Must not contain tvar1 or tvar2.
Optional arguments:
null.mods = RuleFit null-model object returned from procedure intnull. FP 2004 (Sec. 8.3)
nval = number of evaluation points used for calculation (larger values provide higher accuracy with a diminishing return; computation grows as nval^2)
import = interaction importance flag
import= TRUE / FALSE => do/don't scale interaction
strengths according to their importance to the model. FP
2004 (Sec. 8.1)
plot = plotting flag.
plot = TRUE / FALSE => do/don't display barplot
las = label orientation flag
las =1 => horizontal orientation of variable
labels
las =2 => vertical orientation of variable
labels
col = foreground and background barplot colors. If null.mods is missing then interaction strengths are plotted using col[2]. If null.mods is specified then the null standard deviations are plotted in col[1] and the difference between the interaction strengths and their expected null values are plotted in col[2]. Note that the col[1] bars are plotted over the col[2] bars, so that the absence of a col[2] bar indicates that the corresponding interaction strength is less than one standard deviation above its expected null value.
ymax = specified vertical scale upper limit for barplot. If missing then maximum plotted interaction strength value is used.
Output:
If null.mods is missing:
int3var = vector of interaction strengths: int3var[k] is the three-variable interaction strength of tvar1, tvar2, and input variable vars[k]
If null.mods is specified: (list)
int3var$int = vector of interaction strengths: int3var$int[k] is the three-variable interaction strength of tvar1, tvar2, and input variable vars[k]
int3var$nullave = vector of expected null interaction strengths: int3var$nullave[k] is the expected null three-variable interaction strength of tvar1, tvar2, and variable vars[k]
int3var$nullstd = vector of null standard deviations: int3var$nullstd[k] is the standard deviation of the null three-variable interaction strength of tvar1, tvar2, and variable vars[k]
Examples:
threevarint (5,6, c(1:4,7:13)); int3var = threevarint ("Var 1", "Var
2", c("Var 3", "Var 4"), null.mods)
int3var= threevarint ("RM", "PTRATIO", c("DIS", "NOX", "LSTAT"), null.bost,
ymax=0.2)
Comment: Uses current model in the RuleFit home directory. See FP 2004 (Sec. 9) for illustrations.
intnull: compute boostrapped null interaction models to calibrate interaction effects
Usage:
null.mods = intnull (ntimes=10, prevnull.mods=NULL, minimized=F)
Optional arguments:
ntimes = number of null models produced
prevnull.mods = RuleFit null-model object previously produced by intnull. If missing, a new null-model object is created. If present, the new null-models will be added to those contained in the input null model object.
minimized = execution window flag
.
minimized= TRUE / FALSE => do/don't minimize execution window
Output:
null.mods = RuleFit null-model object containing the generated bootstrap null models. It can be used as input to interact, twovarint, and threevarint to calibrate interaction effects.
Examples: bost.null= intnull (); bost.null= intnull(5, bost.null)
Comment: Uses current RuleFit model in the RuleFit home directory. The produced null-model object can only be used as input to interact, twovarint, threevarint or intnull when this RuleFit model and its input data are stored in the RuleFit home directory (see rfmodinfo and rfrestore). See FP 2004 (Sec. 8.3).
rfnullinfo: view identifier of RuleFit null-model object
Usage:
rfnullinfo (null.mods)
Required argument:
null.mods = RuleFit null-model object previously produced by intnull.
Output: none
Example: rfnullinfo (bost.null)
Comment: prints at the command line the number of bootstrapped null interaction models contained in null.mods, and the date and time associated with the RuleFit model that was in the RuleFit home directory at the time the null-model object was created by intnull. It can only be used as input to interact, twovarint, threevarint,or intnull when this RuleFit model and its input data are stored in the RuleFit home directory (see rfmodinfo and rfrestore).
rules: print RuleFit rules in order of importance
Usage:
rules(beg=1, end=beg+9, x=NULL, wt=rep(1,nrow(x)))
Optional arguments:
beg = first rule to be printed
end = last rule to be printed
x = subset of observations over which importances are to be computed. Must be a data frame if a data frame was used to construct the model in the RuleFit home directory. Otherwise it must be a numeric vector or matrix. If missing then all training observations are used. FP 2004 (Sec. 6)
wt = weights for observations stored in x.
If wt is a single valued scalar it is interpreted as a label (number or name)
referencing a column of x. Otherwise it is a vector of length nrow(x) containing
the numeric observation weights.
Output: none
Examples: rules (); rules (11); rules (21, 25); rules (x=xhigh)
Comment: Uses current model in the RuleFit home directory. If a referenced variable is of type factor in a data frame used to construct the RuleFit model, then its values correspond to the codes underlying the factor levels, not the numeric representation of the labels. Otherwise they represent the actual values encoded in the input data interpreted as type numeric.
singleplot: display single variable partial dependence plots
Usage:
singleplot (vars, qntl=0.025, nval=200, nav=500, catvals=NULL, samescale=F, las=2, col="cyan")
Required argument:
vars = vector of variable identifiers (column names or numbers) specifying selected variables to be plotted.
Optional arguments:
qntl = trimming factor for plotting numeric variables. Plots are shown for variable values in the range [quantile (qntl) - quantile(1-qntl)]. (Ignored for categorical variables (factors).)
nval = maximum number of abscissa evaluation points for numeric variables. (Ignored for categorical variables (factors).)
nav = maximum number of observations used for averaging calculations. (larger values provide higher accuracy with a diminishing return; computation grows linearly with nav)
catvals = vector of names for values (levels) of categorical variable (factor). (Ignored for numeric variables or length(vars) > 1)
samescale = plot vertical scaling flag
.
samescale = TRUE / FALSE => do/don't require same vertical scale
for all plots.
las = label orientation flag for categorical variable
plots
las =1 => horizontal orientation of value (level)
names stored in catvals (if present)
las =2 => vertical orientation of value (level)
names stored in catvals (if present)
col = color of barplot for categorical variables
Output: none
Examples: singleplot ("DIS"); singleplot (1:5); singleplot(4,
catvals=levels(boston[[4]]))
singleplot(c("CRIM","NOX","RM","PTRATIO","LSTAT"), samescale=T)
Comment: Uses current model in the RuleFit home directory. If a categorical variable is of type factor in a data frame used to construct the RuleFit model, then its values correspond to the codes underlying the factor levels, not the numeric representation of the labels. Otherwise they represent the actual values encoded in the input data interpreted as type numeric. See FP 2004 (Sec. 8.1).
pairplot: display a two variable partial dependence plot
Usage:
pairplot (var1, var2, type="image", chgvars=F, adjorg=T, qntl=0.025, nval=200, nav=500, vals1=NULL, vals2=NULL, theta=30, phi=15, col='cyan', las=2)
Required arguments:
var1= variable identifier (column name or number) specifying one of the variables to be plotted.
var2= variable identifier (column name or number) specifying the other variable to be plotted. Must not be the same as as var1.
Optional arguments:
type = flag for type of plot when both var1 and
var2 are numeric
type = "image" => heat map plot
type = "persp" => perspective mesh plot
type = "contour" => contour plot
chgvars = flag for changing plotting relationship
when both var1 and var2 are categorical (factors)
chgvars = FALSE => plot the partial dependence
on the variable (factor) with the most values (levels), for each of the
respective values (levels) of the other variable (factor)
chgvars = TRUE => reverse this relationship
adjorg = origin adjustment flag
adjorg = TRUE / FALSE => do/don't adjust the
origin of each partial dependence plot, conditioned on the respective values
(levels) of a categorical variable (factor), to the same value (zero). Ignored
when both var1 and var2 are numeric.
qntl = trimming factor for plotting numeric variables. Plots are shown for variable values in the range [quantile (qntl) - quantile(1-qntl)]. (Ignored for categorical variables (factors).)
nval = maximum number of evaluation points for numeric variables. (Ignored for categorical variables).
nav = maximum number of observations used for averaging calculations. (larger values provide higher accuracy with a diminishing return; computation grows linearly with nav)
vals1 = vector of names for values (levels) of var1 if it is categorical (factor). (Ignored if var1 is numeric)
vals2 = vector of names for values (levels) of var2 if it is categorical (factor). (Ignored if var2 is numeric)
theta, phi = angles defining the viewing direction for perspective mesh plot. theta gives the azimuthal direction and phi the colatitude. (Ignored unless both var1 and var2 are numeric and type = "persp")
col = color of barplots for two categorical variables (factors) or perspective mesh plot for two numeric variables.
las = label orientation flag for categorical variable
plots
las =1 => horizontal orientation of value (level)
names stored in vals1 and/or vals2 (if present).
las =2 => vertical orientation of value (level)
names stored in vals1 and/or vals2 (if present).
Output: none
Examples: pairplot ("NOX", "RM"); pairplot (1,5,
type="persp");
pairplot ("SEX", "DOMICILE", vals1=c("MALE", "FEMALE"), vals2=c("HOUSE",
"CONDO", "TRAILER"))
pairplot(3, 14, vals2=levels(x[[14]])); pairplot(7, 14, vals1=levels(x[[7]]),
vals2=levels(x[[14]]))
Comment: Uses current model in the RuleFit home directory. If a categorical variable is of type factor in a data frame used to construct the RuleFit model, then its values correspond to the codes underlying the factor levels, not the numeric representation of the labels. Otherwise they represent the actual values encoded in the input data interpreted as type numeric.
rfversion: print date and version number of current RuleFit installation
Usage:
rfversion ()
arguments: none
Output: none
Example: rfversion ()
www@stat.stanford.edu