In (could be either a value or

In many scientific researches, such as gene engineering and image analysis,
observations of a response variable of interest are recorded along with a large
number of potential explanatory variables. It is essential to carry out variable
selection for reliable data interpretation, that is to identify the features truly
have impact on the response. Most of the classical model selecting procedure
fails to deal with high-dimensional data (when there are more candidate
variables than observations, p>n), since they suffer the problem of
overfitting. Regularization methods are used to overcome this problem by
introducing a penalty on the model complexity or through early stopping. For
example, Lasso ( ) and Ridge Regression ( ) adds a L1 and L2 penalty
respectively on the coefficients to the initial least-square objective
function, while the regularization parameter in L2-Boosting corresponds to the
number of iterations to take. However, it remains a challenge to determine the
right amount of regularization in general. In Lasso regression, Cross-Validation
is designed to achieve this target, but the resulting model selects too many
candidates (most of which are noise variables, see chapter 3 for an example) to
make the method useful in identifying the signals.


We are going to use a subsampling approach that is built on the
existing variable selecting methods to determine the amount of regularization
(could be either a value or a region), so that the finite sample per-family
error rate (PFER or the expected number of falsely selected variables) is
controlled under certain conditions. We shall also see that improvements on
structure estimation and variable selection could also be made by adding
randomness to the penalty. We also introduce different model selecting
techniques, comparing them on both simulated and real data, and this would give
some intuition on how to choose from those methods in different scenarios.

We Will Write a Custom Essay about In (could be either a value or
For You For Only $13.90/page!

order now