Title: | Model Confidence Bounds |
---|---|
Description: | When choosing proper variable selection methods, it is important to consider the uncertainty of a certain method. The model confidence bound for variable selection identifies two nested models (upper and lower confidence bound models) containing the true model at a given confidence level. A good variable selection method is the one of which the model confidence bound under a certain confidence level has the shortest width. When visualizing the variability of model selection and comparing different model selection procedures, model uncertainty curve is a good graphical tool. A good variable selection method is the one of whose model uncertainty curve will tend to arch towards the upper left corner. This function aims to obtain the model confidence bound and draw the model uncertainty curve of certain single model selection method under a coverage rate equal or little higher than user-given confidential level. About what model confidence bound is and how it work please see Li,Y., Luo,Y., Ferrari,D., Hu,X. and Qin,Y. (2019) Model Confidence Bounds for Variable Selection. Biometrics, 75:392-403. <DOI:10.1111/biom.13024>. Besides, 'flare' is needed only you apply the SQRT or LAD method ('mcb' totally has 8 methods). Although 'flare' has been archived by CRAN, you can still get it in <https://CRAN.R-project.org/package=flare> and the latest version is useful for 'mcb'. |
Authors: | Yang Li, Yichen Qin, Heming Deng |
Maintainer: | Heming Deng<[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.15 |
Built: | 2024-10-29 03:30:48 UTC |
Source: | https://github.com/cran/mcb |
This diabetes data set has n = 352 samples and there are p = 6 predictors: lamotrigine (ltg), total serum cholesterol (tc), total cholesterol (tch), low- and high-density lipoprotein (ldl and hdl) and glucose (glu). The response variable is the measurement of the disease progression one year after baseline.
Diabetes
Diabetes
A dataframe containing 352 records
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of statistics, 32(2):407–499, 2004.
When choosing proper variable selection methods, it is important to consider the uncertainty of a certain method. The MCB for variable selection identifies two nested models (upper and lower confidence bound models) containing the true model at a given confidence level. A good variable selection method is the one of which the MCB under a certain confidence level has the shortest width. When visualizing the variability of model selection and comparing different model selection procedures, Model uncertainty curve is a good graphical tool. A good variable selection method is the one of whose MUC will tend to arch towards the upper left corner. This function aims to obtain the MCB and draw the MUC of certain single model selection method under a coverage rate equal or little higher than user-given confidential level.
mcb(x, y, B=200, lambda=NA, method='Lasso', level=0.95, seed=122)
mcb(x, y, B=200, lambda=NA, method='Lasso', level=0.95, seed=122)
x |
input matrix; each column is an observation vector of certain independent variable, and will be given a name automatically in the order of x1, x2, x3… |
y |
y is a matrix of one column which presents the response vector B number of bootstrap replicates to perform, default value is 200. |
B |
number of bootstrap replicates to perform; Default value is 200. |
lambda |
A user supplied lambda value. It is the penalty tuning parameter for the variable selection method tested. The default value is the optimization outcome automatically computed in consideration of the specific case. |
method |
Default value is ‘Lasso; user can choose from 'aLasso', 'Lasso', 'SCAD', 'MCP', 'stepwise', 'LAD', 'SQRT' |
level |
a positive value between 0 and 1, like the concept of confidence level for point estimation; Default value is 0.95 |
seed |
seed for bootstrap procedures; Default value is 122; |
The mcb method returns an object of class “mcb” The generic accessor functions mcb, mucplot and mcbframe extract various useful features of the value returned by mcb. An object of class “mcb” is a list containing at least the following components:
mcb |
a list containing the bootstrap coverage rate (which is the closest to the user-given confidence level) and the corresponding model confidence bound of the user-chosen variable selection method in the form of lower confidence bound and upper confidence bound. |
mucplot |
plot of the model uncertainty curve for this specific user-chosen variable selectionmethod. |
mcbframe |
a dataframe containing all the information about MCBs for the specific variable selectionmethod under all bootstrap coverage rates including width(w), lower confidence bound(lcb) and upper confidence bound(ucb) for each bootstrap coverage rate(bcr) |
Li,Y., Luo,Y., Ferrari,D., Hu,X. and Qin,Y. (2019) Model Confidence Bounds for Variable Selection. Biometrics, 75:392-403.
data(Diabetes) # load data x <- Diabetes[,c('S1', 'S2', 'S3', 'S4', 'S5')] y <- Diabetes[,c('Y')] x <- data.matrix(x) y <- data.matrix(y) result <- mcb(x=x, y=y) # plot of the model uncertainty curve result$mucplot # a list containing the bootstrap coverage rate and mcb result$mcb # a dataframe containing all the information about MCBs result$mcbframe
data(Diabetes) # load data x <- Diabetes[,c('S1', 'S2', 'S3', 'S4', 'S5')] y <- Diabetes[,c('Y')] x <- data.matrix(x) y <- data.matrix(y) result <- mcb(x=x, y=y) # plot of the model uncertainty curve result$mucplot # a list containing the bootstrap coverage rate and mcb result$mcb # a dataframe containing all the information about MCBs result$mcbframe
This function is a supplement of the function mcb. It is used to compare different variable selection methods and would return all the MUCs on same canvas. A good variable selection method’s MUC will tend to arch towards the upper left corner.
mcb.compare(x, y, B=200, lambdas=NA, methods=NA, level=0.95, seed=122)
mcb.compare(x, y, B=200, lambdas=NA, methods=NA, level=0.95, seed=122)
x |
input matrix presenting independent variables as in mcb. |
y |
response vector as in mcb. |
B |
number of bootstrap replicates to perform; Default value is 200. |
lambdas |
A vector of penalty tuning parameters for each variable selection method. The default values are the optimal choices for each selection method computed automatically. |
methods |
a vector including all variable selection methods the user wants to test and compare. The default value is c ('aLasso', 'Lasso', 'SCAD', 'MCP', 'stepwise', 'LAD', 'SQRT') |
level |
user-defined confidence level as in mcb; Default value is 0.95. |
seed |
Default value is 122. |
The mcb.compare method returns an object of class “mcb.compare” An object of class "mcb.compare " is a list containing at least the following components:
mcb |
a list containing the bootstrap coverage rate and the corresponding model confidence bound for all user-given variable selection methods. |
mucplot |
plot of the model uncertainty curves for all variable selection methods and could be used to choose the best method. |
mcbframe |
a list containing all the information about MCBs for all variable selection methods under all available bootstrap coverage rates. |
Li,Y., Luo,Y., Ferrari,D., Hu,X. and Qin,Y. (2019) Model Confidence Bounds for Variable Selection. Biometrics, 75:392-403.
data(Diabetes) # load data x <- Diabetes[,c('S1', 'S2', 'S3', 'S4', 'S5')] y <- Diabetes[,c('Y')] x <- data.matrix(x) y <- data.matrix(y) result <- mcb.compare(x=x, y=y) # plot of the model uncertainty curves for all variable selection methods result$mucplot # a list containing the bootstrap coverage rate and mcb which based on Lasso result$mcb$Lasso # a dataframe containing all the information about MCBs which based on Lasso result$mcbframe$Lasso
data(Diabetes) # load data x <- Diabetes[,c('S1', 'S2', 'S3', 'S4', 'S5')] y <- Diabetes[,c('Y')] x <- data.matrix(x) y <- data.matrix(y) result <- mcb.compare(x=x, y=y) # plot of the model uncertainty curves for all variable selection methods result$mucplot # a list containing the bootstrap coverage rate and mcb which based on Lasso result$mcb$Lasso # a dataframe containing all the information about MCBs which based on Lasso result$mcbframe$Lasso