--- title: "Variable Importance Vignette" output: rmarkdown::html_vignette: toc: TRUE number_sections: TRUE vignette: > %\VignetteIndexEntry{Variable Importance Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` A paper that describes the variable importance measures in more detail should be available soon. # Variable Importance ## Definition L0-penalization based modified variable importance is defined in the following way $$mVI(i|X, y, \lambda) = \min_{\beta:\beta_i \neq 0} Q(\beta|X, y, \lambda) - \min_{\beta:\beta_i = 0} Q(\beta|X, y, \lambda) + \lambda |S_i|$$ where $Q(\beta|X, y, \lambda) = -2l(\beta|X, y) + \lambda ||\beta||_0$, $||\beta||_0$ is the number of nonzero elements in $\beta$, and $||S_i||$ is the number of beta parameters associated with the ith set of variables. The number of parameters in the ith set of variables is 1 for continuous variables and is the number of levels minus 1 for categorical variables. $\lambda$ is defined by the chosen metric where AIC results in $\lambda = 2$, BIC results in $\lambda = \log{(n)}$, and HQIC results in $\lambda = 2\log{(\log{(n)})}$. These variable importance values are equivalent to the traditional likelihood ratio test for beta parameters when $\lambda = 0$. However, when $\lambda > 0$, the null distribution of the variable importance values may not be chi-squared distributed. P-values for the variable importance values may be obtained from the `VariableImportance.boot()` function which uses a parametric bootstrap approach to approximate the null distribution. This process entails performing best subset selection many times over, so it is quite slow. ## Variable importance example L0-penalization based variable importance values may be calculated with the `VariableImportance()` function. The `VariableImportance()` function requires an object returned from calling the `VariableSelection()` function. The exact variable importance values are returned if a branch and bound algorithm is used with the `VariableSelection()` function. If a heuristic method is used with the `VariableSelection()` function, then approximate variable importance values based on the specified heuristic method are returned. ```{r} # Loading BranchGLM package library(BranchGLM) # Using iris dataset to demonstrate usage of VI Data <- iris Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity") # Doing branch and bound selection VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", showprogress = FALSE) # Getting variable importance VI <- VariableImportance(VS, showprogress = FALSE) VI ``` We can visualize the variable importance values with the `barplot()` function. ```{r, fig.height = 4, fig.width = 6} # Plotting variable importance oldmar <- par("mar") par(mar = c(4, 6, 3, 1) + 0.1) barplot(VI) par(mar = oldmar) ``` ### P-values We can get approximate p-values based on the L0-penalization based variable importance values from the `VariableImportance.boot()` function. This function uses a parametric bootstrap approach to create an approximate null distribution for the variable importance values. This approach is very slow, so it is not feasible to get these p-values when there are many sets of variables. ```{r} # Getting approximate null distributions set.seed(59903) myBoot <- VariableImportance.boot(VI, nboot = 1000, showprogress = FALSE) myBoot ``` We can visualize the results from `VariableImportance.boot()` with the `hist()` function or the `boxplot()` function. The `boxplot()` approach is convenient because we can look at all of the results in one plot while the `hist()` approach only contains the results for one set of variables in each plot. ```{r, fig.height = 4, fig.width = 6} # Plotting histogram of results for second set of variables hist(myBoot) # Plotting boxplots of results oldmar <- par("mar") par(mar = c(4, 6, 3, 1) + 0.1) boxplot(myBoot, las = 1) par(mar = oldmar) ```