## Glmtlp Vignette## IntroductionGlmtlp is a package that makes it incredibly easy to fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the truncated lasso at a grid of values for the regularization parameter lambda. The algorithm is fast, and can exploit sparsity in the input matrix x. It fits linear, logistic and multinomial, poisson, and Cox regression models. Most supporting functions used in a popular R package The authors of
\[ \min_{\beta_0,\beta}\frac{1}{N}\sum_{i=1}^{N}w_i l(y_i,\beta_0 + \beta'x_i) + \lambda \sum_{j=1}^{p} \min(|\beta_j|/\tau,1) \] over a grid of values of \(\lambda\) covering the entire range. Here \(l(y,\eta)\) is the negative log-likelihood contribution for observation \(i\); e.g. for the Gaussian case it is \(1/2 (y-\eta)^2\). The truncated lasso penalty is controlled by \(\tau\), and partly solves the biased issues in lasso penalty. The tuning parameter \(\lambda\) controls the overall strength of the penalty. ## InstallationTo install the stable version from CRAN, simply run the following from an R console (not available now): install.packages("glmtlp") To install the latest development builds directly from GitHub, run this instead: if (!require("devtools")) install.packages("devtools") library(devtools) devtools::install_github("ChongWu-Biostat/glmtlp") ## Quick StartThe purpose of this section is to give users a general idea of the package. In terms of usage, the package is almost the same as First, we load the library(glmtlp) We will keep most parameters as default and mainly focus on the general sense of the package. We load a set of data created beforehand for illustration. Users can either load their own data or use those saved in the package. This data is exactly same as a data set in data("QuickStartExample") The command loads an input matrix \(x\) and a response vector \(y\) from the package. Then we fit the model using the basic function glmTLP. fit = glmTLP(x,y) “fit” is an object of class plot(fit) Each curve corresponds to a variable. The above figure shows the path of its coefficient against the \(l_1\)-norm of the whole coefficient vector at as \(\lambda\) varies. The axis above indicates the number of nonzero coefficients at the current \(\lambda\), which is the effective degrees of freedom (df) for the lasso. Users may also wish to annotate the curves; this can be done by setting A summary of the print(fit) Call: glmTLP(x = x, y = y) Df %Dev Lambda [1,] 7 0.8544 1.8350000 [2,] 7 0.8623 1.6720000 [3,] 8 0.8696 1.5230000 [4,] 8 0.8761 1.3880000 [5,] 8 0.8815 1.2650000 [6,] 8 0.8859 1.1520000 [7,] 8 0.8896 1.0500000 [8,] 8 0.8927 0.9567000 [9,] 8 0.8970 0.8717000 .... It shows the number of nonzero coefficients ( We can obtain the actual coefficients at one or more \(\lambda\)'s within the range of the sequence. Note that even though coef(fit,s=0.1) 21 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 0.11678951 V1 1.37182915 V2 0.01835052 V3 0.75251617 V4 0.04492075 V5 -0.89167611 V6 0.60355346 V7 0.10393552 V8 0.38759151 V9 -0.01738196 V10 0.10891171 V11 0.23712445 V12 -0.05190906 V13 -0.03808770 V14 -1.14807069 V15 -0.11590021 V16 -0.03162155 V17 -0.02981730 V18 0.04185775 V19 . V20 -1.13622401 Users can also make predicitions at specific \(\lambda\)'s with new input data: nx = matrix(rnorm(10*20),10,20) predict(fit,newx=nx,s=c(0.1,0.05)) 1 2 [1,] -2.7302125 -2.7212503 [2,] -0.1261640 -0.1037789 [3,] 2.4302532 2.4263055 [4,] 4.4345692 4.4825886 [5,] -0.8703297 -0.8770905 [6,] -3.6949918 -3.6809959 [7,] -0.5279150 -0.4789763 [8,] 1.6668486 1.6533317 [9,] 0.8976858 0.9092636 [10,] -3.7293284 -3.7076979 The function
cvfit = cv.glmTLP(x, y,tau = 1)
We can plot the object. plot(cvfit) The above figure shows the cross-validation curve (red dotted line), and upper and lower standard deviation curves along the \(\lambda\) sequence (error bars). Two selected \(\lambda\)'s are indicated by the vertical dotted lines. We can view the selected \(\lambda\)'s and the corresponding coefficients. For example, cvfit$lambda.min [1] 0.3773556
Like coef(cvfit, s = "lambda.min") 21 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 0.146500723 V1 1.340178977 V2 . V3 0.707734699 V4 . V5 -0.847077207 V6 0.553671322 V7 0.037104906 V8 0.346252025 V9 . V10 0.007849848 V11 0.184759134 V12 . V13 . V14 -1.084353223 V15 -0.001922412 V16 . V17 . V18 . V19 . V20 -1.067892212 Like Predictions can be made based on the fitted predict(cvfit, newx = x[1:5,], s = "lambda.min") 1 [1,] -1.3591677 [2,] 2.5760596 [3,] 0.5843585 [4,] 2.0212049 [5,] 1.5709110
If you are familiar with |