Implementation of LASSO regression, Ridge regression and Elastic Net model in r language

Implementation of LASSO regression, Ridge regression and Elastic Net model in r language

Original link: tecdat.cn/?p=3795

Original source: Tuoduan Data Tribe Official Account

 

Glmnet is a software package for fitting generalized linear models by penalizing the maximum likelihood relationship. The regularization path is calculated for the lasso or Elastic Net (elastic network) penalty value at the value grid of the regularization parameter . The algorithm is very fast and can take advantage of the sparsity in the input matrix 

x
. It is suitable for linear, logistic and polynomial, Poisson and Cox regression models. Various predictions can be made from the fitted model. It can also fit multiple linear regression.

glmnet
 Solve the following problems

On a grid of lambda values covering the entire range. Here l(y, ) is the negative log-likelihood contribution of observation i; for example, it is for the Gaussian distribution. The elastic network penalty is controlled by , LASSO ( =1, default), Ridge ( =0). Adjust the parameter to control the total intensity of the penalty.

As we all know, the ridge penalty reduces the coefficients of related predictors, and Lasso tends to choose one of them and discard the other predictors. The elastic network mixes the two together.

 

glmnet
 The algorithm uses the cyclic coordinate descent method, which continuously optimizes the objective function under the condition of each parameter being fixed, and loops repeatedly until convergence. Our algorithm can calculate the solution path very quickly.

The code can handle the sparse input matrix format and the range constraints of the coefficients. It also includes methods for prediction and drawing, as well as the function of performing K-fold cross-validation.

 

Quick start

 

1. we load 

glmnet
 package:

library(glmnet) Copy code

The default model used in the package is a Gaussian linear model or "least squares". We load a set of pre-created data for illustration. Users can load their own data or use the data saved in the workspace.

This command loads the input matrix from the saved R data 

x
 Sum factor 
y
.

We fit the model 

glmnet
.

fit = glmnet (x, y) copy the code

Can be executed by

plot
 Function to visualize the coefficients:

plot(fit) Copy code

Each curve corresponds to a variable. It shows the path of the 1 norm of its coefficient relative to the entire coefficient vector when changes. The upper axis represents the number of non-zero coefficients at the current , which is the effective degree of freedom ( df ) of the lasso . The user may also wish to annotate the curve. This can be done by

label = TRUE
 Set it in the plot command to complete.

glmnet
 If we just enter the object name or use
print
 Function, the path of each step will be displayed. Summary:

print(fit) Copy code
## ## Call: glmnet(x = x, y = y) ## ## Df %Dev Lambda ## [1,] 0 0.0000 1.63000 ## [2,] 2 0.0553 1.49000 ## [3,] 2 0.1460 1.35000 ## [4,] 2 0.2210 1.23000 ## [5,] 2 0.2840 1.12000 ## [6,] 2 0.3350 1.02000 ## [7,] 4 0.3900 0.93300 ## [8,] 5 0.4560 0.85000 ## [9,] 5 0.5150 0.77500 ## [10,] 6 0.5740 0.70600 ## [11,] 6 0.6260 0.64300 ## [12,] 6 0.6690 0.58600 ## [13,] 6 0.7050 0.53400 ## [14,] 6 0.7340 0.48700 ## [15,] 7 0.7620 0.44300 ## [16,] 7 0.7860 0.40400 ## [17,] 7 0.8050 0.36800 ## [18,] 7 0.8220 0.33500 ## [19,] 7 0.8350 0.30600 ## [20,] 7 0.8460 0.27800 Copy code

It shows the number of non-zero coefficients from left to right (

Df
), explained (zero) deviation percentage (
%dev
) And (
Lambda
) Value.

We can obtain one or more actual coefficients at within the range of the sequence:

coef(fit,s=0.1) Copy code
## 21 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 0.150928 ## V1 1.320597 ## V2. ## V3 0.675110 ## V4. ## V5 -0.817412 ## V6 0.521437 ## V7 0.004829 ## V8 0.319416 ## V9. ## V10. ## V11 0.142499 ## V12. ## V13. ## V14 -1.059979 ## V15. ## V16. ## V17. ## V18. ## V19. ## V20 -1.021874 Copy code

You can also use new input data to predict at a specific :

predict (fit, newx = nx, s = c (0.1,0.05)) copying the code
## 1 2 ## [1,] 4.4641 4.7001 ## [2,] 1.7509 1.8513 ## [3,] 4.5207 4.6512 ## [4,] -0.6184 -0.6764 ## [5,] 1.7302 1.8451 ## [6,] 0.3565 0.3512 ## [7,] 0.2881 0.2662 ## [8,] 2.7776 2.8209 ## [9,] -3.7016 -3.7773 ## [10,] 1.1546 1.1067 Copy code

The function 

glmnet
 Returns a series of models for the user to choose from. Cross-validation is probably the simplest and most widely used method for this task.

cv.glmnet
 Is the main function of cross-validation.

cv.glmnet
 Return a 
cv.glmnet
 The object, here "cvfit", contains a list of all the components of the cross-validation fit.

We can draw objects.

It includes a cross-validation curve (red dashed line) and an upper and lower standard deviation curve (error bars) along the lambda sequence. The vertical dashed lines represent two selected lambdas.

We can view the selected and the corresponding coefficient. E.g,

cvfit$lambda.minCopy code
## [1] 0.08307Copy code

lambda.min
 Is the value that gives the smallest average cross-validation error. The other saved is 
lambda.1se
, It gives the model so that the error is within one standard error of the minimum value. We just need to replace 
lambda.min
 To
lambda.1se
 the above.

coef(cvfit, s = "lambda.min") Copy code
## 21 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 0.14936 ## V1 1.32975 ## V2. ## V3 0.69096 ## V4. ## V5 -0.83123 ## V6 0.53670 ## V7 0.02005 ## V8 0.33194 ## V9. ## V10. ## V11 0.16239 ## V12. ## V13. ## V14 -1.07081 ## V15. ## V16. ## V17. ## V18. ## V19. ## V20 -1.04341 Copy code

Note that the coefficients are expressed in sparse matrix format. The reason is that the solution along the regularization path is usually sparse, so using the sparse format is more efficient in time and space.

Can be based on the fitted

cv.glmnet
 Objects make predictions. Let us look at an example.

## 1 ## [1,] -1.3647 ## [2,] 2.5686 ## [3,] 0.5706 ## [4,] 1.9682 ## [5,] 1.4964 Copy code

newx
 With the new input matrix 
s
The same, as mentioned earlier, is the predicted lambda value.

 

Linear regression

Linear regression here refers to two model series. one is 

gaussian
Normal distribution , and the other is 
mgaussian
Multivariate normal distribution .

Normal distribution

Suppose we have observations xi Rp and yi R, i = 1,...,N. The objective function is

Among them, 0 is the complexity parameter, and 0 1 is between ridge regression ( =0) and lasso LASSO ( =1).

The coordinate descent method is used to solve this problem. Specifically, by calculating the gradient at j= ~j and simple calculations, it is updated to

among them.

when

x
 When the variable is standardized to have unit variance (default value), the above formula applies.

glmnet
 Provide various options for user customization. We are here to introduce some commonly used options, they can be
glmnet
 Specify in the function.

  • alpha
     Represents the elastic net mixing parameter , the range [0,1]. =1 is the lasso (default), =0 is Ridge.
  • weights
     Used to observe the weight. The default value of each observation is 1.
  • nlambda
     Is the number of lambda values in the sequence. The default value is 100.
  • lambda
     Can be provided, but usually not provided, the program will build a sequence. When automatically generated, the sequence is determined by
    lambda.max
     And ok 
    lambda.min.ratio
    .
  • standardize
     Yes
    x
     Logical signs of variable standardization before fitting the model sequence.

For example, we set =0.2 and assign twice the weight to the observations in the second half. To avoid displaying here for too long, we set it 

nlambda
 Is 20. However, in practice, it is recommended to set the number of to 100 (default value) or more.

Then we can output

glmnet
 Object.

print(fit) Copy code
## ## Call: glmnet(x = x, y = y, weights = c(rep(1, 50), rep(2, 50)), alpha = 0.2, nlambda = 20) ## ## Df %Dev Lambda ## [1,] 0 0.000 7.94000 ## [2,] 4 0.179 4.89000 ## [3,] 7 0.444 3.01000 ## [4,] 7 0.657 1.85000 ## [5,] 8 0.785 1.14000 ## [6,] 9 0.854 0.70300 ## [7,] 10 0.887 0.43300 ## [8,] 11 0.902 0.26700 ## [9,] 14 0.910 0.16400 ## [10,] 17 0.914 0.10100 ## [11,] 17 0.915 0.06230 ## [12,] 17 0.916 0.03840 ## [13,] 19 0.916 0.02360 ## [14,] 20 0.916 0.01460 ## [15,] 20 0.916 0.00896 ## [16,] 20 0.916 0.00552 ## [17,] 20 0.916 0.00340 Copy code

This will show the call to generate the object 

fit
 And with columns
Df
 (Number of non-zero coefficients), 
%dev
 (Percent deviation explained) and
Lambda
 (Corresponding to the value) of the three-column matrix.

We can draw the fitted object.

Let's mark each curve against the log-lambda value to draw a "fit".

This is the percentage of deviation in the training data. What we see here is that at the end of the path, the value does not change much, but the coefficient is a bit "expanded". This allows us to focus on the important fitting parts.

We can extract coefficients and make predictions for certain specific values. Two commonly used options are:

  • s
     Specify the value for extraction.
  • exact
     Indicates whether the exact value of the coefficient is required.

A simple example is:

## 21 x 2 sparse Matrix of class "dgCMatrix" ## 1 1 ## (Intercept) 0.19657 0.199099 ## V1 1.17496 1.174650 ## V2.. ## V3 0.52934 0.531935 ## V4.. ## V5 -0.76126 -0.760959 ## V6 0.46627 0.468209 ## V7 0.06148 0.061927 ## V8 0.38049 0.380301 ## V9.. ## V10.. ## V11 0.14214 0.143261 ## V12.. ## V13.. ## V14 -0.91090 -0.911207 ## V15.. ## V16.. ## V17.. ## V18. 0.009197 ## V19.. ## V20 -0.86099 -0.863117 Copy code

The left column is,

exact = TRUE
 The right column is 
FALSE
. From the above we can see that 0.01 is not in the sequence, so although there is not much difference, there are still some differences. If there are no special requirements, linear interpolation is sufficient.

Users can make predictions based on the fitted objects. In addition to the options in 

coef
, The main parameters are 
newx
New value matrix 
x
.
type
 Option allows the user to choose the type of prediction: *"link" gives the fitted value

  • The dependent variable is the same as the "link" of the normal distribution.
  • "Coefficient" is calculated as the coefficient 
    s

E.g,

## 1 ## [1,] -0.9803 ## [2,] 2.2992 ## [3,] 0.6011 ## [4,] 2.3573 ## [5,] 1.7520 Copy code

The fitted value of the first 5 observations is given when =0.05. If multiple values are provided, 

s
 The prediction matrix will be generated.

Users can customize K-fold cross-validation. Except all 

glmnet
 Outside the parameters, 
cv.glmnet
 There are also special parameters, including 
nfolds
 (frequency), 
foldid
 (Number of times provided by the user), 
type.measure
(Loss for cross-validation): * "deviance" or "mse"

  • "Mae" uses mean absolute error

for example,

cvfit = cv.glmnet (x, y, type.measure = "mse", nfolds = 20) copying the code

Perform 20-fold cross-validation according to the mean square error standard.

Parallel computing is also affected by 

cv.glmnet
. Here is a simple comparison example for us.

system.time(cv.glmnet(X, Y)) Copy code
## user system elapsed ## 3.591 0.103 3.724 Copy code
system.time (cv.glmnet (X, Y, parallel = TRUE)) copying the code
## user system elapsed ## 4.318 0.391 2.700 Copy code

As can be seen from the above suggestions, parallel computing can greatly speed up the calculation process.

  • "Lambda.min": that reaches the minimum MSE.
cvfit$lambda.minCopy code
## [1] 0.08307Copy code
## 21 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 0.14936 ## V1 1.32975 ## V2. ## V3 0.69096 ## V4. ## V5 -0.83123 ## V6 0.53670 ## V7 0.02005 ## V8 0.33194 ## V9. ## V10. ## V11 0.16239 ## V12. ## V13. ## V14 -1.07081 ## V15. ## V16. ## V17. ## V18. ## V19. ## V20 -1.04341 Copy code

Here, we use the same k-fold and choose a value for .

Place them all on the same drawing:

We see lasso(

alpha=1
) Perform best here.

Coefficient upper and lower limit

Suppose we want to fit our model, but limit the coefficient to be greater than -0.7 and less than 0.5. This can be done by

upper.limits
 with 
lower.limits
 Parameter realization:

Usually, we want the coefficient to be positive, so we can only

lower.limit
 Set it to 0.

Punishment factor

This parameter allows the user to apply a separate penalty factor to each coefficient. The default value of each parameter is 1, but other values can be specified. In particular, any

penalty.factor
 Variables equal to zero will not be punished

 

In many cases, certain variables may be important, and we want to keep them all the time. This can be achieved by setting the corresponding penalty factor to 0:

We see from the label that the three variables with a penalty factor of 0 are always kept in the model, while other variables follow the typical regularization path and eventually shrink to 0.

 

Custom map

Sometimes, especially when the number of variables is small, we want to add variable labels to the graph.

We first generate some data with 10 variables, then we fit the glmnet model and draw a standard graph.

We want to label the curve with variable names. Place the coefficient at the end of the path.

 

Multivariate normal

use

family = "mgaussian"
 option to obtain a multivariate normal distribution
glmnet
.

Obviously, as the name suggests, y is not a vector, but a matrix. As a result, the coefficient of each value is also a matrix.

Here, we solve the following problems:

Here, j is the jth row of the p K coefficient matrix . For a single predictor variable xj, we replace the absolute penalty of each single coefficient with the lasso penalty of each coefficient K vector j.

We use a set of pre-generated data for illustration.

We fit the data and return the object "mfit".

mfit = glmnet (x, y, family = "mgaussian") copy the code

If it is 

standardize.response = TRUE
, The dependent variable is standardized.

To visualize the coefficients, we use 

plot
 function.

Note that we set 

type.coef = "2norm"
. Under this setting, each variable draws a curve whose value is equal to the 2 norm. The default setting is 
type.coef = "coef"
, Where a coefficient plot is created for each dependent variable.

By using the function

coef
 , We can extract the coefficient of the required value and make predictions through.

## , , 1 ## ## y1 y2 y3 y4 ## [1,] -4.7106 -1.1635 0.6028 3.741 ## [2,] 4.1302 -3.0508 -1.2123 4.970 ## [3,] 3.1595 -0.5760 0.2608 2.054 ## [4,] 0.6459 2.1206 -0.2252 3.146 ## [5,] -1.1792 0.1056 -7.3353 3.248 ## ## , , 2 ## ## y1 y2 y3 y4 ## [1,] -4.6415 -1.2290 0.6118 3.780 ## [2,] 4.4713 -3.2530 -1.2573 5.266 ## [3,] 3.4735 -0.6929 0.4684 2.056 ## [4,] 0.7353 2.2965 -0.2190 2.989 ## [5,] -1.2760 0.2893 -7.8259 3.205 Copy code

The prediction results are stored in a three-dimensional array, where the first two dimensions are the prediction matrix of each dependent variable, and the third dimension represents the dependent variable.

We can also perform k-fold cross-validation.

We draw the result 

cv.glmnet
 Object "cvmfit".

Display the selected optimal value of

cvmfit$lambda.minCopy code
## [1] 0.04732Copy code
cvmfit$lambda.1seCopy code
## [1] 0.1317Copy code

Logistic regression

When the dependent variable is categorical, logistic regression is another widely used model. If there are two possible outcomes, use a binomial distribution, otherwise use a polynomial.

Binomial model

For the binomial model, suppose the value of the dependent variable is G = {1,2}. It means yi = I (gi = 1). We model

Can be written in the following form

 

The objective function of penalized logistic regression uses negative binomial log likelihood

Our algorithm uses a quadratic approximation of the log-likelihood, and then reduces the resulting penalty-weighted least squares problem. These constitute the internal and external circulation.

For illustration purposes, we load the pre-generated input matrix from the data file 

x
 And dependent variable 
y
.

For binomial logistic regression, the dependent variable y can be a two-level factor, or a two-column matrix of counts or proportions.

glmnet
 The other optional parameters of the binomial regression are almost the same as those of the normal distribution. Don't forget to
family
 The option is set to "binomial".

fit = glmnet (x, y, family = "binomial") copy the code

As before, we can output and plot the fitted objects, extract the coefficients at a specific , and make predictions.

Logistic regression is slightly different, mainly reflected in the choice 

type
. "Link" and "dependent variable" are not equivalent, and "class" can only be used for logistic regression. In short, *"link" gives linear predictor variables

  • "Dependent variable" gives the appropriate probability
  • "Category" produces the category label corresponding to the greatest probability.
  • "Coefficient" is calculated as the coefficient 
    s

In the following example, we have predicted the category label when =0.05,0.01.

## 1 2 ## [1,] "0" "0" ## [2,] "1" "1" ## [3,] "1" "1" ## [4,] "0" "0" ## [5,] "1" "1" Copy code

For logistic regression,

type.measure
:

  • "Deviation" uses the actual deviation.
  • "Mae" uses the mean absolute error.
  • "Class" gives the wrong classification error.
  • "Auc" (only for two types of logistic regression) gives the area under the ROC curve.

E.g,

It uses classification error as the criterion for 10-fold cross-validation.

We draw the object and display the best value of .

cvfit$lambda.minCopy code
## [1] 0.01476Copy code
cvfit$lambda.1seCopy code
## [1] 0.02579Copy code

coef
 and 
predict
 Similar to the normal distribution case, so we omit the details. We review through some examples.

## 31 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 0.24371 ## V1 0.06897 ## V2 0.66252 ## V3 -0.54275 ## V4 -1.13693 ## V5 -0.19143 ## V6 -0.95852 ## V7. ## V8 -0.56529 ## V9 0.77454 ## V10 -1.45079 ## V11 -0.04363 ## V12 -0.06894 ## V13. ## V14. ## V15. ## V16 0.36685 ## V17. ## V18 -0.04014 ## V19. ## V20. ## V21. ## V22 0.20882 ## V23 0.34014 ## V24. ## V25 0.66310 ## V26 -0.33696 ## V27 -0.10570 ## V28 0.24318 ## V29 -0.22445 ## V30 0.11091 Copy code

As mentioned earlier, the results returned here are only for the second category of factor dependent variables.

## 1 ## [1,] "0" ## [21" ## [3,] "1" ## [4,] "0" ## [5,] "1" ## [6,] "0" ## [7,] "0" ## [8,] "0" ## [9,] "1" ## [10,] "1" Copy code

Polynomial model

For the polynomial model, assume that the K level of the dependent variable is G = {1,2,...,K}. Here we model

Let Y be the N K index dependent variable matrix, and the element yi = I (gi = ). Then the negative log likelihood function of the elastic net penalty becomes

is a p K matrix of coefficients. k refers to the kth column (for the result category k), and j refers to the jth row (a vector of K coefficients of variable j). The last penalty term is || j|| q, we have two choices for q: q {1,2}. When q = 1, this is the lasso penalty for each parameter. When q = 2, this is a group lasso penalty for all K coefficients of a specific variable, which makes them all zero or non-zero together.

For the polynomial case, the usage is similar to logistic regression, we load a set of generated data.

glmnet
 Except for a few cases, the optional parameters in polynomial logistic regression are basically similar to those in binomial regression.

A special option for polynomial regression is 

type.multinomial
, If allowed, allow the use of grouped lasso penalties 
type.multinomial = "grouped"
. This will ensure that the polynomial coefficients of the variables are all input or output together, just like a multivariate dependent variable.

We draw the result.

We can also perform cross-validation and draw the returned objects.

 

Predict the best choice :

## 1 ## [1,] "3" ## [22" ## [3,] "2" ## [4,] "1" ## [5,] "1" ## [6,] "3" ## [7,] "3" ## [8,] "1" ## [9,] "1" ## [10,] "2" Copy code

 

Poisson model

Poisson regression is used to model count data under the assumption of Poisson error, or to model with non-negative data under the condition that the mean and variance are proportional. Poisson is also a member of the exponential distribution family. We usually model by logarithm:.
Log-likelihood of a given observation

 

As before, we optimized the penalty logarithm:

Glmnet uses an external Newton cycle and an internal weighted least squares cycle (such as logistic regression) to optimize this criterion.

1. we load a set of Poisson data.

 

Again, plot the coefficients.

As before, we can use separately

coef
 And extract the coefficients and make predictions at a specific  
predict
.

For example, we can

## 21 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 0.61123 ## V1 0.45820 ## V2 -0.77061 ## V3 1.34015 ## V4 0.04350 ## V5 -0.20326 ## V6. ## V7. ## V8. ## V9. ## V10. ## V11. ## V12 0.01816 ## V13. ## V14. ## V15. ## V16. ## V17. ## V18. ## V19. ## V20. Copy code
## 1 2 ## [1,] 2.4944 4.4263 ## [2,] 10.3513 11.0586 ## [3,] 0.1180 0.1782 ## [4,] 0.9713 1.6829 ## [5,] 1.1133 1.9935 Copy code

We can also use cross-validation to find the best for inference.

The options are almost the same as the normal family, except that 

type.measure
*, "mse" stands for mean square error*, "mae" stands for absolute mean error.

We can draw 

cv.glmnet
 Object.

We can also display the best and the corresponding coefficients.

## 21 x 2 sparse Matrix of class "dgCMatrix" ## 1 2 ## (Intercept) 0.031263 0.18570 ## V1 0.619053 0.57537 ## V2 -0.984550 -0.93212 ## V3 1.525234 1.47057 ## V4 0.231591 0.19692 ## V5 -0.336659 -0.30469 ## V6 0.001026. ## V7 -0.012830. ## V8.. ## V9.. ## V10 0.015983. ## V11.. ## V12 0.030867 0.02585 ## V13 -0.027971. ## V14 0.032750. ## V15 -0.005933. ## V16 0.017506. ## V17.. ## V18 0.004026. ## V19 -0.033579. ## V20 0.012049 0.00993 Copy code

 

Cox model

The Cox proportional hazards model is usually used to study the relationship between predictor variables and survival time.

The Cox proportional hazard regression model does not directly examine the relationship with X, but uses it as a dependent variable. The basic form of the model is:

Where is the partial regression coefficient of the independent variable, which is a parameter that must be estimated from the sample data; is the baseline hazard rate when the X vector is 0, which is the amount to be estimated from the sample data. Referred to as the Cox back to the regression model .

Since the Cox regression model does not make any assumptions, the Cox regression model has greater flexibility in dealing with problems; on the other hand, in many cases, we only need to estimate the parameters (such as factor analysis, etc.), even when unknown In the case of, the parameters can still be estimated. That is to say, the Cox regression model is not a complete parametric model because it contains, but it can still estimate the parameters according to formula (1), so the Cox regression model is a semi-parametric model .

The formula can be transformed into:

 

We use a set of pre-generated sample data. Users can load their own data and follow a similar process. In this case, x must be an n p matrix of covariate values-each row corresponds to one patient, and each column corresponds to one covariate. y is an n 2 matrix.

## time status ## [1,] 1.76878 1 ## [2,] 0.54528 1 ## [3,] 0.04486 0 ## [4,] 0.85032 0 ## [5,] 0.61488 1 Copy code

Surv
 Functions in the package 
survival
 It is possible to create such a matrix.

We calculate the solution path under the default settings.

Plot the coefficient.

Extract the coefficient at a specific value .

## 30 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## V1 0.37694 ## V2 -0.09548 ## V3 -0.13596 ## V4 0.09814 ## V5 -0.11438 ## V6 -0.38899 ## V7 0.24291 ## V8 0.03648 ## V9 0.34740 ## V10 0.03865 ## V11. ## V12. ## V13. ## V14. ## V15. ## V16. ## V17. ## V18. ## V19. ## V20. ## V21. ## V22. ## V23. ## V24. ## V25. ## V26. ## V27. ## V28. ## V29. ## V30. Copy code

function 

cv.glmnet
 Can be used to calculate the k-fold cross-validation of the Cox model.

After fitting, we can view the best lambda value and the cross-validated error graph to help evaluate our model.

As mentioned earlier, the left vertical line in the figure shows us where the CV error curve reaches its minimum. The vertical line on the right shows us the regularized model whose CV error is within 1 standard deviation of the minimum value. We also extracted the optimal .

cvfit$lambda.minCopy code
## [1] 0.01594Copy code
cvfit$lambda.1seCopy code
## [1] 0.04869Copy code

We can examine the covariates in the model and view their coefficients.

index.minCopy code
## [1] 0.491297 -0.174601 -0.218649 0.175112 -0.186673 -0.490250 0.335197 ## [8] 0.091587 0.450169 0.115922 0.017595 -0.018365 -0.002806 -0.001423 ## [15] -0.023429 0.001688 -0.008236 Copy code
coef.minCopy code
## 30 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## V1 0.491297 ## V2 -0.174601 ## V3 -0.218649 ## V4 0.175112 ## V5 -0.186673 ## V6 -0.490250 ## V7 0.335197 ## V8 0.091587 ## V9 0.450169 ## V10 0.115922 ## V11. ## V12. ## V13 0.017595 ## V14. ## V15. ## V16. ## V17 -0.018365 ## V18. ## V19. ## V20. ## V21 -0.002806 ## V22 -0.001423 ## V23. ## V24. ## V25 -0.023429 ## V26. ## V27 0.001688 ## V28. ## V29. ## V30 -0.008236 Copy code

 

sparse matrix

Our package supports sparse input matrices, which can store and manipulate large matrices efficiently, but only have a few non-zero entries.

We load a set of pre-created sample data.

Load 100*20 sparse matrix and 

y
Factor vector.

## [1] "dgCMatrix" ## attr(,"package") ## [1] "Matrix" Copy code

We can fit the model as before.

fit = glmnet (x, y) copy the code

Perform cross-validation and draw the result object.

Predict the new input matrix. E.g,

## 1 ## [1,] 0.3826 ## [2,] -0.2172 ## [3,] -1.6622 ## [4,] -0.4175 ## [5,] -1.3941 Copy code

 

references

Jerome Friedman, Trevor Hastie and Rob Tibshirani. (2008).
Regularization Paths for Generalized Linear Models via Coordinate Descent


references

1. Matlab Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR)

2. Principal component pca, t-SNE algorithm dimensionality reduction and visual analysis of high-dimensional data in R language

3. Principal component analysis (PCA) basic principles and analysis examples

4. Realize LASSO regression analysis based on R language

5. Use LASSO regression to predict stock return data analysis

6. The lasso regression, ridge ridge regression and elastic-net model in r language

7. Data analysis of partial least squares regression pls-da in r language

8. Partial least squares pls regression algorithm in r language

9. R language linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and regular discriminant analysis (RDA)