Preface | p. v |
Variability, Information, and Prediction | p. 1 |
The Curse of Dimensionality | p. 3 |
The Two Extremes | p. 4 |
Perspectives on the Curse | p. 5 |
Sparsity | p. 6 |
Exploding Numbers of Models | p. 8 |
Multicollinearity and Concurvity | p. 9 |
The Effect of Noise | p. 10 |
Coping with the Curse | p. 11 |
Selecting Design Points | p. 11 |
Local Dimension | p. 12 |
Parsimony | p. 17 |
Two Techniques | p. 18 |
The Bootstrap | p. 18 |
Cross-Validation | p. 27 |
Optimization and Search | p. 32 |
Univariate Search | p. 32 |
Multivariate Search | p. 33 |
General Searches | p. 34 |
Constraint Satisfaction and Combinatorial Search | p. 35 |
Notes | p. 38 |
Hammersley Points | p. 38 |
Edgeworth Expansions for the Mean | p. 39 |
Bootstrap Asymptotics for the Studentized Mean | p. 41 |
Exercises | p. 43 |
Local Smoothers | p. 53 |
Early Smoothers | p. 55 |
Transition to Classical Smoothers | p. 59 |
Global Versus Local Approximations | p. 60 |
LOESS | p. 64 |
Kernel Smoothers | p. 67 |
Statistical Function Approximation | p. 68 |
The Concept of Kernel Methods and the Discrete Case | p. 73 |
Kernels and Stochastic Designs: Density Estimation | p. 78 |
Stochastic Designs: Asymptotics for Kernel Smoothers | p. 81 |
Convergence Theorems and Rates for Kernel Smoothers | p. 86 |
Kernel and Bandwidth Selection | p. 90 |
Linear Smoothers | p. 95 |
Nearest Neighbors | p. 96 |
Applications of Kernel Regression | p. 100 |
A Simulated Example | p. 100 |
Ethanol Data | p. 102 |
Exercises | p. 107 |
Spline Smoothing | p. 117 |
Interpolating Splines | p. 117 |
Natural Cubic Splines | p. 123 |
Smoothing Splines for Regression | p. 126 |
Model Selection for Spline Smoothing | p. 129 |
Spline Smoothing Meets Kernel Smoothing | p. 130 |
Asymptotic Bias, Variance, and MISE for Spline Smoothers | p. 131 |
Ethanol Data Example - Continued | p. 133 |
Splines Redux: Hilbert Space Formulation | p. 136 |
Reproducing Kernels | p. 138 |
Constructing an RKHS | p. 141 |
Direct Sum Construction for Splines | p. 146 |
Explicit Forms | p. 149 |
Nonparametrics in Data Mining and Machine Learning | p. 152 |
Simulated Comparisons | p. 154 |
What Happens with Dependent Noise Models? | p. 157 |
Higher Dimensions and the Curse of Dimensionality | p. 159 |
Notes | p. 163 |
Sobolev Spaces: Definition | p. 163 |
Exercises | p. 164 |
New Wave Nonparametrics | p. 171 |
Additive Models | p. 172 |
The Backfitting Algorithm | p. 173 |
Concurvity and Inference | p. 177 |
Nonparametric Optimality | p. 180 |
Generalized Additive Models | p. 181 |
Projection Pursuit Regression | p. 184 |
Neural Networks | p. 189 |
Backpropagation and Inference | p. 192 |
Barron's Result and the Curse | p. 197 |
Approximation Properties | p. 198 |
Barron's Theorem: Formal Statement | p. 200 |
Recursive Partitioning Regression | p. 202 |
Growing Trees | p. 204 |
Pruning and Selection | p. 207 |
Regression | p. 208 |
Bayesian Additive Regression Trees: BART | p. 210 |
MARS | p. 210 |
Sliced Inverse Regression | p. 215 |
ACE and AVAS | p. 218 |
Notes | p. 220 |
Proof of Barron's Theorem | p. 220 |
Exercises | p. 224 |
Supervised Learning: Partition Methods | p. 231 |
Multiclass Learning | p. 233 |
Discriminant Analysis | p. 235 |
Distance-Based Discriminant Analysis | p. 236 |
Bayes Rules | p. 241 |
Probability-Based Discriminant Analysis | p. 245 |
Tree-Based Classifiers | p. 249 |
Splitting Rules | p. 249 |
Logic Trees | p. 253 |
Random Forests | p. 254 |
Support Vector Machines | p. 262 |
Margins and Distances | p. 262 |
Binary Classification and Risk | p. 265 |
Prediction Bounds for Function Classes | p. 268 |
Constructing SVM Classifiers | p. 271 |
SVM Classification for Nonlinearly Separable Populations | p. 279 |
SVMs in the General Nonlinear Case | p. 282 |
Some Kernels Used in SVM Classification | p. 288 |
Kernel Choice, SVMs and Model Selection | p. 289 |
Support Vector Regression | p. 290 |
Multiclass Support Vector Machines | p. 293 |
Neural Networks | p. 294 |
Notes | p. 296 |
Hoeffding's Inequality | p. 296 |
VC Dimension | p. 297 |
Exercises | p. 300 |
Alternative Nonparametrics | p. 307 |
Ensemble Methods | p. 308 |
Bayes Model Averaging | p. 310 |
Bagging | p. 312 |
Stacking | p. 316 |
Boosting | p. 318 |
Other Averaging Methods | p. 326 |
Oracle Inequalities | p. 328 |
Bayes Nonparametrics | p. 334 |
Dirichlet Process Priors | p. 334 |
Polya Tree Priors | p. 336 |
Gaussian Process Priors | p. 338 |
The Relevance Vector Machine | p. 344 |
RVM Regression: Formal Description | p. 345 |
RVM Classification | p. 349 |
Hidden Markov Models - Sequential Classification | p. 352 |
Notes | p. 354 |
Proof of Yang's Oracle Inequality | p. 354 |
Proof of Lecue's Oracle Inequality | p. 357 |
Exercises | p. 359 |
Computational Comparisons | p. 365 |
Computational Results: Classification | p. 366 |
Comparison on Fisher's Iris Data | p. 366 |
Comparison on Ripley's Data | p. 369 |
Computational Results: Regression | p. 376 |
Vapnik's sinc Function | p. 377 |
Friedman's Function | p. 389 |
Conclusions | p. 392 |
Systematic Simulation Study | p. 397 |
No Free Lunch | p. 400 |
Exercises | p. 402 |
Unsupervised Learning: Clustering | p. 405 |
Centroid-Based Clustering | p. 408 |
K-Means Clustering | p. 409 |
Variants | p. 412 |
Hierarchical Clustering | p. 413 |
Agglomerative Hierarchical Clustering | p. 414 |
Divisive Hierarchical Clustering | p. 422 |
Theory for Hierarchical Clustering | p. 426 |
Partitional Clustering | p. 430 |
Model-Based Clustering | p. 432 |
Graph-Theoretic Clustering | p. 447 |
Spectral Clustering | p. 452 |
Bayesian Clustering | p. 458 |
Probabilistic Clustering | p. 458 |
Hypothesis Testing | p. 461 |
Computed Examples | p. 463 |
Ripley's Data | p. 465 |
Iris Data | p. 475 |
Cluster Validation | p. 480 |
Notes | p. 484 |
Derivatives of Functions of a Matrix | p. 484 |
Kruskal's Algorithm: Proof | p. 484 |
Prim's Algorithm: Proof | p. 485 |
Exercises | p. 485 |
Learning in High Dimensions | p. 493 |
Principal Components | p. 495 |
Main Theorem | p. 496 |
Key Properties | p. 498 |
Extensions | p. 500 |
Factor Analysis | p. 502 |
Finding and | p. 504 |
Finding K | p. 506 |
Estimating Factor Scores | p. 507 |
Projection Pursuit | p. 508 |
Independent Components Analysis | p. 511 |
Main Definitions | p. 511 |
Key Results | p. 513 |
Computational Approach | p. 515 |
Nonlinear PCs and ICA | p. 516 |
Nonlinear PCs | p. 517 |
Nonlinear ICA | p. 518 |
Geometric Summarization | p. 518 |
Measuring Distances to an Algebraic Shape | p. 519 |
Principal Curves and Surfaces | p. 520 |
Supervised Dimension Reduction: Partial Least Squares | p. 523 |
Simple PLS | p. 523 |
PLS Procedures | p. 524 |
Properties of PLS | p. 526 |
Supervised Dimension Reduction: Sufficient Dimensions in Regression | p. 527 |
Visualization I: Basic Plots | p. 531 |
Elementary Visualization | p. 534 |
Projections | p. 541 |
Time Dependence | p. 543 |
Visualization II: Transformations | p. 546 |
Chernoff Faces | p. 546 |
Multidimensional Scaling | p. 547 |
Self-Organizing Maps | p. 553 |
Exercises | p. 560 |
Variable Selection | p. 569 |
Concepts from Linear Regression | p. 570 |
Subset Selection | p. 572 |
Variable Ranking | p. 575 |
Overview | p. 577 |
Traditional Criteria | p. 578 |
Akaike Information Criterion (AIC) | p. 580 |
Bayesian Information Criterion (BIC) | p. 583 |
Choices of Information Criteria | p. 585 |
Cross Validation | p. 587 |
Shrinkage Methods | p. 599 |
Shrinkage Methods for Linear Models | p. 601 |
Grouping in Variable Selection | p. 615 |
Least Angle Regression | p. 617 |
Shrinkage Methods for Model Classes | p. 620 |
Cautionary Notes | p. 631 |
Bayes Variable Selection | p. 632 |
Prior Specification | p. 635 |
Posterior Calculation and Exploration | p. 643 |
Evaluating Evidence | p. 647 |
Connections Between Bayesian and Frequentist Methods | p. 650 |
Computational Comparisons | p. 653 |
The n>p Case | p. 653 |
When p>n | p. 665 |
Notes | p. 667 |
Code for Generating Data in Section 10.5 | p. 667 |
Exercises | p. 671 |
Multiple Testing | p. 679 |
Analyzing the Hypothesis Testing Problem | p. 681 |
A Paradigmatic Setting | p. 681 |
Counts for Multiple Tests | p. 684 |
Measures of Error in Multiple Testing | p. 685 |
Aspects of Error Control | p. 687 |
Controlling the Familywise Error Rate | p. 690 |
One-Step Adjustments | p. 690 |
Stepwise p-Value Adjustments | p. 693 |
PCER and PFER | p. 695 |
Null Domination | p. 696 |
Two Procedures | p. 697 |
Controlling the Type I Error Rate | p. 702 |
Adjusted p-Values for PFER/PCER | p. 706 |
Controlling the False Discovery Rate | p. 707 |
FDR and other Measures of Error | p. 709 |
The Benjamini-Hochberg Procedure | p. 710 |
A BH Theorem for a Dependent Setting | p. 711 |
Variations on BH | p. 713 |
Controlling the Positive False Discovery Rate | p. 719 |
Bayesian Interpretations | p. 719 |
Aspects of Implementation | p. 723 |
Bayesian Multiple Testing | p. 727 |
Fully Bayes: Hierarchical | p. 728 |
Fully Bayes: Decision theory | p. 731 |
Notes | p. 736 |
Proof of the Benjamini-Hochberg Theorem | p. 736 |
Proof of the Benjamini-Yekutieli Theorem | p. 739 |
References | p. 743 |
Index | p. 773 |
Table of Contents provided by Ingram. All Rights Reserved. |