## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Structured sparsity via alternating direction methods

Journal of Machine Learning Research, no. 1 (2012): 1435-1468

EI

Keywords

Abstract

We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the re...More

Code:

Data:

Introduction

- For feature learning problems in a high-dimensional space, sparsity in the feature vector is usually a desirable property.
- The Group Lasso model (Yuan and Lin, 2006; Bach, 2008; Roth and Fischer, 2008) assumes disjoint groups and enforces sparsity on the pre-defined groups of features
- This model has been extended to allow for groups that are hierarchical as well as overlapping (Jenatton et al, 2011; Kim and Xing, 2010; Bach, 2010) with a wide array of applications from gene selection (Kim and Xing, 2010) to computer vision (Huang et al, 2009; Jenatton et al, 2010).
- The authors consider the following basic model of minimizing the squared-error loss with a regularization term c 2012 Zhiwei Qin and Donald Goldfarb

Highlights

- For feature learning problems in a high-dimensional space, sparsity in the feature vector is usually a desirable property
- We propose new algorithms: Alternating Linearization Method with Skipping with partial splitting (APLM-S) and FISTA with partial linearization (FISTA-p), to serve as the key building block for this framework
- 3.4, we present an algorithm FISTA with partial linearization, which is a special version of FAPLM-S in which every iteration is a skipping iteration and which has a much simpler form than FAPLM-S, while having essentially the same iteration complexity
- Remark 3 We have shown that with a fixed v, the ISTA: Partial Linearization iterations are exactly the same as the Alternating Direction Augmented Lagrangian iterations
- We present an accelerated version FISTA with partial linearization of ISTA: Partial Linearization
- Computational tests on several sets of synthetic test data demonstrated the relative strength of the algorithms, and through two real-world applications we compared the relative merits of these structured sparsity-inducing norms

Methods

- 3.1 Alternating Direction Augmented Lagrangian (ADAL) Method.
- The well-known Alternating Direction Augmented Lagrangian (ADAL) method (Eckstein and Bertsekas, 1992; Gabay and Mercier, 1976; Glowinski and Marroco, 1975; Boyd et al, 2010)2 approximately minimizes the augmented Lagrangian by minimizing (5) with respect to x and y alternatingly and updates the Lagrange multiplier v on each iteration.
- By applying variable-splitting to the problem minx f (x) + ∑Ki=1 gi(Cix), it can be transformed into min x,y1,··· ,yK s.t

Results

- 2 + λ(. 2 + λ(
- It has been shown in Mairal et al (2010) that the authors can significantly improve the quality of segmentation by applying a group-structured regularization Ω(·) on e, where the groups are all the overlapping k × k-square patches in the image

Conclusion

- The authors have built a unified framework for solving sparse learning problems involving group-structured regularization, in particular, the l1/l2- or l1/l∞-regularization of arbitrarily overlapping groups of variables.
- The authors have incorporated ADAL and FISTA into the framework.
- Computational tests on several sets of synthetic test data demonstrated the relative strength of the algorithms, and through two real-world applications the authors compared the relative merits of these structured sparsity-inducing norms.
- FISTA-p and ADAL performed the best on most of the data sets, and FISTA.
- To avoid confusion with the algorithms that consist of inner-outer iterations, the authors prefix the algorithms with ‘AugLag’ here

- Table1: Specification of the quantities used in the outer and inner stopping criteria
- Table2: The Breast Cancer Data Set
- Table3: Computational results for the video sequence background subtraction example. The algorithm used is FISTA-p. We used the Matlab version for the ease of generating the images. The C++ version runs at least four times faster from our experience in the previous experiments. We report the best accuracy found on the regularization path of each model. The total CPU time is recorded for computing the entire regularization path, with the specified number of different regularization parameter values
- Table4: Numerical results for ogl set 1. For ProxGrad, Avg Sub-Iters and F(x) fields are not applicable since the algorithm is not based on an outer-inner iteration scheme, and the objective function that it minimizes is different from ours. We tested ten problems with J = 100, · · · , 1000, but only show the results for three of them to save space
- Table5: Numerical results for ogl set 2. We ran the test for ten problems with n = 1000, · · · , 10000, but only show the results for three of them to save space
- Table6: Numerical results for dct set 2 (scalability test) with l1/l2-regularization. All three algorithms were ran in factorization mode with a fixed μ = μ0
- Table7: Numerical results for dct set 2 (scalability test) with l1/l∞-regularization. The algorithm configurations are exactly the same as in Table 6
- Table8: Numerical results for the DCT set with l1/l2-regularization. FISTA-p and ADAL were ran in PCG mode with the dynamic scheme for updating μ. μ was fixed at μ0 for FISTA
- Table9: Numerical results for the DCT set with l1/l∞-regularization. FISTA-p and ADAL were ran in PCG mode. The dynamic updating scheme for μ was applied to FISTA-p, while μ was fixed at μ0 for ADAL and FISTA
- Table10: Numerical results for Breast Cancer Data using l1/l2-regularization. In this experiment, we kept μ constant at 0.01 for ADAL. The CPU time is for a single run on the entire data set with the value of λ selected to minimize the RMSE in Figure 4

Related work

- Two proximal gradient methods have been proposed to solve a close variant of (1) with an l1/l2 penalty, min x∈Rm L(x) + Ωl1 /l2 (x) λ x (3)

which has an additional l1-regularization term on x. Chen et al (2010) replace Ωl1/l2(x) with a smooth approximation Ωη(x) by using Nesterov’s smoothing technique (Nesterov, 2005) and solve the resulting problem by the Fast Iterative Shrinkage Thresholding algorithm (FISTA) (Beck and

Teboulle, 2009). The parameter η is a smoothing parameter, upon which the practical and theoretical convergence speed of the algorithm critically depends. Liu and Ye (2010) also apply FISTA to solve (3), but in each iteration, they transform the computation of the proximal operator associated with the combined penalty term into an equivalent constrained smooth problem and solve it by Nesterov’s accelerated gradient descent method (Nesterov, 2005). Mairal et al (2010) apply the accelerated proximal gradient method to (1) with l1/l∞ penalty and propose a network flow algorithm to solve the proximal problem associated with Ωl1/l∞(x). The method proposed by Mosci et al (2010) for solving the Group Lasso problem in Jacob et al (2009) is in the same spirit as the method of Liu and Ye (2010), but their approach uses a projected Newton method.

Funding

- This research was supported in part by NSF Grant DMS 10-16571, ONR Grant N00014-08-1-1118 and DOE Grant DE-FG02-08ER25856

Reference

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn