Contents
List of Figures xi
Preface xvii
1 Introduction 1
1.1 MotivatingExamples......................... 4
1.1.1 Computer Network Diagnosis ................ 4
1.1.2 NeuroimagingAnalysis ................... 5
1.1.3 CompressedSensing ..................... 8
1.2 SparseRecoveryinaNutshell .................... 9
1.3 StatisticalLearningversusCompressedSensing .......... 11
1.4 SummaryandBibliographicalNotes................. 12
2 Sparse Recovery: Problem Formulations 15
2.1 NoiselessSparseRecovery ...................... 16
2.2 Approximations ........................... 18
2.3 Convexity:BriefReview ....................... 19
2.4 Relaxations of (P
0
) Problem ..................... 20
2.5 The Effect of l
q
-RegularizeronSolutionSparsity .......... 21
2.6 l
1
-normMinimizationasLinearProgramming ........... 22
2.7 NoisySparseRecovery........................ 23
2.8 AStatisticalViewofSparseRecovery ................ 27
2.9 Beyond LASSO: Other Loss Functions and Regularizers . ..... 30
2.10SummaryandBibliographicalNotes................. 33
3 Theoretical Results (Deterministic Part) 35
3.1 TheSamplingTheorem ....................... 36
3.2 SurprisingEmpiricalResults ..................... 36
3.3 SignalRecoveryfromIncompleteFrequencyInformation ..... 39
3.4 MutualCoherence .......................... 40
3.5 Spark and Uniqueness of (P
0
) Solution ............... 42
3.6 Null Space Property and Uniqueness of (P
1
) Solution ....... 45
3.7 RestrictedIsometryProperty(RIP) ................. 46
3.8 Square Roo t Bottleneck for the Worst-Case Exact Recovery . . . . 47
3.9 ExactRecoveryBasedonRIP .................... 48
3.10SummaryandBibliographicalNotes................. 52
vii
viii Contents
4 Theoretical Results (Probabilistic Part) 53
4.1 WhenDoesRIPHold? ........................ 54
4.2 Johnson-Lindenstrauss Lemma and RIP for Subgaussian Random
Matrices ............................... 54
4.2.1 Proof of the Johnson-Lindenstrauss Concentration
Inequality . . . ........................ 55
4.2.2 RIP for Matrices with Subgaussian Random Entries ..... 56
4.3 Random Matrices Satisfying RIP . . ................ 59
4.3.1 EigenvaluesandRIP ..................... 60
4.3.2 Random Vectors, Isotropic Random Vectors . . . . ..... 60
4.4 RIP for Matrices with Independent Bounded Rows and Matrices with
Random Rows of Fourier Transform . ................ 61
4.4.1 ProofofURI ......................... 64
4.4.2 Tail Bound for the Uniform Law of Large Numbers (ULLN) 67
4.5 SummaryandBibliographicalNotes................. 69
5 Algorithms for Sparse Recovery Problems 71
5.1 Univariate Thresholding is Optimal for Orthogonal Designs . . . . 72
5.1.1 l
0
-normMinimization .................... 73
5.1.2 l
1
-normMinimization .................... 74
5.2 Algorithms for l
0
-normMinimization ................ 76
5.2.1 An Overview of Greedy Methods . . ............ 79
5.3 Algorithms for l
1
-normMinimization(LASSO)........... 82
5.3.1 LeastAngleRegressionforLASSO(LARS) ........ 82
5.3.2 CoordinateDescent...................... 86
5.3.3 Proximal Methods . . .................... 87
5.4 SummaryandBibliographicalNotes................. 92
6 Beyond LASSO: Structured Sparsity 95
6.1 TheElasticNet ............................ 96
6.1.1 The Elastic Net in Practice: Neuroimaging Applications . . 100
6.2 FusedLASSO ............................ 107
6.3 Group LASSO: l
1
/l
2
Penalty .................... 109
6.4 Simultaneous LASSO: l
1
/l
Penalty ................ 110
6.5 Generalizations ............................ 111
6.5.1 Block l
1
/l
q
-Norms and Beyond . . . ............ 111
6.5.2 Overlapping Groups . .................... 112
6.6 Applications ............................. 114
6.6.1 TemporalCausalModeling.................. 114
6.6.2 GeneralizedAdditiveModels ................ 115
6.6.3 Mu ltiple Kernel Lear ning . . ................ 115
6.6.4 Multi-Task Learning . .................... 117
6.7 SummaryandBibliographicalNotes................. 118
Contents ix
7 Beyond LASSO: Other Loss Functions 121
7.1 SparseRecoveryfromNoisyObservations ............. 122
7.2 Exponential Family, GLMs, and Bregman Divergences . . ..... 123
7.2.1 Exponential Family . . .................... 124
7.2.2 GeneralizedLinearModels(GLMs)............. 125
7.2.3 BregmanDivergence..................... 126
7.3 SparseRecoverywithGLMRegression ............... 128
7.4 SummaryandBibliographicNotes.................. 136
8 Sparse Graphical Models 139
8.1 Background . . . . . . ........................ 140
8.2 MarkovNetworks .......................... 141
8.2.1 MarkovNetworkProperties:ACloserLook......... 142
8.2.2 GaussianMRFs........................ 144
8.3 LearningandInferenceinMarkovNetworks ............ 145
8.3.1 Learning ........................... 145
8.3.2 Inference ........................... 146
8.3.3 Example:NeuroimagingApplications............ 147
8.4 LearningSparseGaussianMRFs................... 151
8.4.1 SparseInverseCovarianceSelectionProblem........ 152
8.4.2 OptimizationApproaches .................. 153
8.4.3 SelectingRegularizationParameter ............. 160
8.5 SummaryandBibliographicalNotes................. 165
9 Sparse Matrix Factorization: Dictionary Learning a nd Beyond 167
9.1 DictionaryLearning ......................... 168
9.1.1 ProblemFormulation..................... 169
9.1.2 AlgorithmsforDictionaryLearning ............. 170
9.2 SparsePCA.............................. 174
9.2.1 Background . . ........................ 174
9.2.2 SparsePCA:SynthesisView................. 176
9.2.3 SparsePCA:AnalysisView ................. 178
9.3 SparseNMFforBlindSourceSeparation .............. 179
9.4 SummaryandBibliographicalNotes................. 182
Epilogue 185
Appendix Mathematical Background 187
A.1 Norms,Matrices,andEigenvalues .................. 187
A.1.1 ShortSummaryofEigentheory ............... 188
A.2 DiscreteFourierTransform...................... 190
A.2.1 The Discrete Whittaker-Nyquist-Kotelnikov-Shannon
SamplingTheorem...................... 191
A.3 Complexity of l
0
-normMinimization ................ 192
A.4 Subgaussian Random Variables . . . ................ 192
A.5 Random Variables and Symmetrization in R
n
............ 197
x Contents
A.6 Subgaussian Processes ........................ 199
A.7 Dudley Entropy Inequality . . .................... 200
A.8 Large Deviation for the Bounded Random Operators . . . ..... 202
Bibliography 203
Index 227

Get Sparse Modeling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.