Principles and Theory for Data Mining and Machine Learning

By: Bertrand Clarke, Hao Helen Zhang, Ernest Fokoue

Write A Review

Hardcover | 30 July 2009

At a Glance

Hardcover
800 Pages

Dimensions(cm)
24.1 x 15.6 x 5.9

Hardcover

$314.09

or 4 interest-free payments of $78.52 with

or

Aims to ship in 7 to 10 business days

This book is a thorough introduction to the most important topics in data mining and machine learning. It begins with a detailed review of classical function estimation and proceeds with chapters on nonlinear regression, classification, and ensemble methods. The final chapters focus on clustering, dimension reduction, variable selection, and multiple comparisons. All these topics have undergone extraordinarily rapid development in recent years and this treatment offers a modern perspective emphasizing the most recent contributions. The presentation of foundational results is detailed and includes many accessible proofs not readily available outside original sources. While the orientation is conceptual and theoretical, the main points are regularly reinforced by computational comparisons.

Intended primarily as a graduate level textbook for statistics, computer science, and electrical engineering students, this book assumes only a strong foundation in undergraduate statistics and mathematics, and facility with using R packages. The text has a wide variety of problems, many of an exploratory nature. There are numerous computed examples, complete with code, so that further computations can be carried out readily. The book also serves as a handbook for researchers who want a conceptual overview of the central topics in data mining and machine learning.

Industry Reviews

From the reviews: "PhD level students, and researchers and practitioners in statistical learning and machine learning. ... text assumes a thorough training in undergraduate statistics and mathematics. Computed examples that include R code are scattered through the text. There are numerous exercises, many with commentary that sets out guidelines for exploration. ... The over-riding reason for staying with the independent, symmetric unimodal error model is surely that no one book can cover everything! Within these bounds, this book gives a careful treatment that is encyclopedic in its scope." (John H. Maindonald, International Statistical Review, Vol. 79 (1), 2011) "It is an appropriate textbook for a PhD level course and can also be used as a reference or for independent reading. ... an excellent resource for researchers and students interested in DMML. ... the authors have done an outstanding job of covering important topics and providing relevant statistical theory and computational resources. I can see myself teaching a statistical learning class using this book and comfortably recommend it to any researcher with a solid mathematical background who wants to be engaged in this field." (Jeongyoun Ahn, Journal of the American Statistical Association, Vol. 106 (493), March, 2011)

Shipping

	Standard Shipping	Express Shipping
Metro postcodes:	$9.99	$14.95
Regional postcodes:	$9.99	$14.95
Rural postcodes:	$9.99	$14.95

How to return your order

At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.

Additional postage charges may be applicable.

Defective items

If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.

For more info please visit our Help Centre.

More in Probability & Statistics

The Practice of Statistics in the Life Sciences

4th Edition - Digital Update

Paperback

RRP $206.90

$205.25

Psychology Statistics For Dummies

For Dummies

Paperback

RRP $39.95

$28.75

28%
OFF

The Art of Uncertainty

How to Navigate Chance, Ignorance, Risk and Luck

Hardcover

RRP $55.00

$42.25

23%
OFF

The Art of Statistics

Learning from Data

Paperback

RRP $24.99

$21.75

13%
OFF

Mathematical Statistics with Applications

7th Edition

Hardcover

RRP $232.95

$183.75

21%
OFF

Introductory Econometrics for Finance

4th edition

Paperback

RRP $101.95

$87.35

14%
OFF

Statistics Using Stata

3rd Edition - An Integrative Approach

Paperback

RRP $125.95

$116.50

Introduction to Medical Statistics

4th edition

Paperback

RRP $70.95

$62.35

12%
OFF

Business Statistics

11th Edition - A Decision Making Approach, Global Edition

Paperback

RRP $171.95

$135.95

21%
OFF

The Maths Book

Big Ideas Simply Explained

Hardcover

RRP $42.99

$32.50

24%
OFF

Calling Bullshit

The Art of Scepticism in a Data-Driven World

Paperback

RRP $22.99

$20.35

11%
OFF

The Black Swan

The Impact of the Highly Improbable

Paperback

RRP $24.99

$21.75

13%
OFF

Fooled By Randomness

The Hidden Role of Chance in Life and in the Markets

Paperback

RRP $24.99

$21.75

13%
OFF

Business Research Methods

14th edition

Paperback

RRP $159.95

$141.80

11%
OFF

ISE Business Statistics and Analytics in Practice

9th Edition

Paperback

RRP $154.95

$138.75

10%
OFF

Multivariate Data Analysis

8th Edition

Paperback

RRP $169.95

$137.95

19%
OFF

A Second Course in Statistics

7th Edition - Regression Analysis

Book with Other Items

RRP $179.95

$138.25

23%
OFF

Sampling

3rd Edition - Design and Analysis

Hardcover

RRP $154.00

$119.75

22%
OFF

Freakonomics Revised and Expanded Edition : A Rogue Economist Explores the Hidden Side of Everything - Stephen J. Dubner

$73.75

51%
OFF

Statistics Without Tears

An Introduction For Non-Mathematicians

Paperback

RRP $22.99

$17.75

23%
OFF

Statistics for The Behavioral Sciences

10th Edition

Paperback

RRP $194.95

$155.90

20%
OFF

Think Stats

Exploratory Data Analysis

Paperback

RRP $66.50

$34.90

48%
OFF

Preface	p. v
Variability, Information, and Prediction	p. 1
The Curse of Dimensionality	p. 3
The Two Extremes	p. 4
Perspectives on the Curse	p. 5
Sparsity	p. 6
Exploding Numbers of Models	p. 8
Multicollinearity and Concurvity	p. 9
The Effect of Noise	p. 10
Coping with the Curse	p. 11
Selecting Design Points	p. 11
Local Dimension	p. 12
Parsimony	p. 17
Two Techniques	p. 18
The Bootstrap	p. 18
Cross-Validation	p. 27
Optimization and Search	p. 32
Univariate Search	p. 32
Multivariate Search	p. 33
General Searches	p. 34
Constraint Satisfaction and Combinatorial Search	p. 35
Notes	p. 38
Hammersley Points	p. 38
Edgeworth Expansions for the Mean	p. 39
Bootstrap Asymptotics for the Studentized Mean	p. 41
Exercises	p. 43
Local Smoothers	p. 53
Early Smoothers	p. 55
Transition to Classical Smoothers	p. 59
Global Versus Local Approximations	p. 60
LOESS	p. 64
Kernel Smoothers	p. 67
Statistical Function Approximation	p. 68
The Concept of Kernel Methods and the Discrete Case	p. 73
Kernels and Stochastic Designs: Density Estimation	p. 78
Stochastic Designs: Asymptotics for Kernel Smoothers	p. 81
Convergence Theorems and Rates for Kernel Smoothers	p. 86
Kernel and Bandwidth Selection	p. 90
Linear Smoothers	p. 95
Nearest Neighbors	p. 96
Applications of Kernel Regression	p. 100
A Simulated Example	p. 100
Ethanol Data	p. 102
Exercises	p. 107
Spline Smoothing	p. 117
Interpolating Splines	p. 117
Natural Cubic Splines	p. 123
Smoothing Splines for Regression	p. 126
Model Selection for Spline Smoothing	p. 129
Spline Smoothing Meets Kernel Smoothing	p. 130
Asymptotic Bias, Variance, and MISE for Spline Smoothers	p. 131
Ethanol Data Example - Continued	p. 133
Splines Redux: Hilbert Space Formulation	p. 136
Reproducing Kernels	p. 138
Constructing an RKHS	p. 141
Direct Sum Construction for Splines	p. 146
Explicit Forms	p. 149
Nonparametrics in Data Mining and Machine Learning	p. 152
Simulated Comparisons	p. 154
What Happens with Dependent Noise Models?	p. 157
Higher Dimensions and the Curse of Dimensionality	p. 159
Notes	p. 163
Sobolev Spaces: Definition	p. 163
Exercises	p. 164
New Wave Nonparametrics	p. 171
Additive Models	p. 172
The Backfitting Algorithm	p. 173
Concurvity and Inference	p. 177
Nonparametric Optimality	p. 180
Generalized Additive Models	p. 181
Projection Pursuit Regression	p. 184
Neural Networks	p. 189
Backpropagation and Inference	p. 192
Barron's Result and the Curse	p. 197
Approximation Properties	p. 198
Barron's Theorem: Formal Statement	p. 200
Recursive Partitioning Regression	p. 202
Growing Trees	p. 204
Pruning and Selection	p. 207
Regression	p. 208
Bayesian Additive Regression Trees: BART	p. 210
MARS	p. 210
Sliced Inverse Regression	p. 215
ACE and AVAS	p. 218
Notes	p. 220
Proof of Barron's Theorem	p. 220
Exercises	p. 224
Supervised Learning: Partition Methods	p. 231
Multiclass Learning	p. 233
Discriminant Analysis	p. 235
Distance-Based Discriminant Analysis	p. 236
Bayes Rules	p. 241
Probability-Based Discriminant Analysis	p. 245
Tree-Based Classifiers	p. 249
Splitting Rules	p. 249
Logic Trees	p. 253
Random Forests	p. 254
Support Vector Machines	p. 262
Margins and Distances	p. 262
Binary Classification and Risk	p. 265
Prediction Bounds for Function Classes	p. 268
Constructing SVM Classifiers	p. 271
SVM Classification for Nonlinearly Separable Populations	p. 279
SVMs in the General Nonlinear Case	p. 282
Some Kernels Used in SVM Classification	p. 288
Kernel Choice, SVMs and Model Selection	p. 289
Support Vector Regression	p. 290
Multiclass Support Vector Machines	p. 293
Neural Networks	p. 294
Notes	p. 296
Hoeffding's Inequality	p. 296
VC Dimension	p. 297
Exercises	p. 300
Alternative Nonparametrics	p. 307
Ensemble Methods	p. 308
Bayes Model Averaging	p. 310
Bagging	p. 312
Stacking	p. 316
Boosting	p. 318
Other Averaging Methods	p. 326
Oracle Inequalities	p. 328
Bayes Nonparametrics	p. 334
Dirichlet Process Priors	p. 334
Polya Tree Priors	p. 336
Gaussian Process Priors	p. 338
The Relevance Vector Machine	p. 344
RVM Regression: Formal Description	p. 345
RVM Classification	p. 349
Hidden Markov Models - Sequential Classification	p. 352
Notes	p. 354
Proof of Yang's Oracle Inequality	p. 354
Proof of Lecue's Oracle Inequality	p. 357
Exercises	p. 359
Computational Comparisons	p. 365
Computational Results: Classification	p. 366
Comparison on Fisher's Iris Data	p. 366
Comparison on Ripley's Data	p. 369
Computational Results: Regression	p. 376
Vapnik's sinc Function	p. 377
Friedman's Function	p. 389
Conclusions	p. 392
Systematic Simulation Study	p. 397
No Free Lunch	p. 400
Exercises	p. 402
Unsupervised Learning: Clustering	p. 405
Centroid-Based Clustering	p. 408
K-Means Clustering	p. 409
Variants	p. 412
Hierarchical Clustering	p. 413
Agglomerative Hierarchical Clustering	p. 414
Divisive Hierarchical Clustering	p. 422
Theory for Hierarchical Clustering	p. 426
Partitional Clustering	p. 430
Model-Based Clustering	p. 432
Graph-Theoretic Clustering	p. 447
Spectral Clustering	p. 452
Bayesian Clustering	p. 458
Probabilistic Clustering	p. 458
Hypothesis Testing	p. 461
Computed Examples	p. 463
Ripley's Data	p. 465
Iris Data	p. 475
Cluster Validation	p. 480
Notes	p. 484
Derivatives of Functions of a Matrix	p. 484
Kruskal's Algorithm: Proof	p. 484
Prim's Algorithm: Proof	p. 485
Exercises	p. 485
Learning in High Dimensions	p. 493
Principal Components	p. 495
Main Theorem	p. 496
Key Properties	p. 498
Extensions	p. 500
Factor Analysis	p. 502
Finding and	p. 504
Finding K	p. 506
Estimating Factor Scores	p. 507
Projection Pursuit	p. 508
Independent Components Analysis	p. 511
Main Definitions	p. 511
Key Results	p. 513
Computational Approach	p. 515
Nonlinear PCs and ICA	p. 516
Nonlinear PCs	p. 517
Nonlinear ICA	p. 518
Geometric Summarization	p. 518
Measuring Distances to an Algebraic Shape	p. 519
Principal Curves and Surfaces	p. 520
Supervised Dimension Reduction: Partial Least Squares	p. 523
Simple PLS	p. 523
PLS Procedures	p. 524
Properties of PLS	p. 526
Supervised Dimension Reduction: Sufficient Dimensions in Regression	p. 527
Visualization I: Basic Plots	p. 531
Elementary Visualization	p. 534
Projections	p. 541
Time Dependence	p. 543
Visualization II: Transformations	p. 546
Chernoff Faces	p. 546
Multidimensional Scaling	p. 547
Self-Organizing Maps	p. 553
Exercises	p. 560
Variable Selection	p. 569
Concepts from Linear Regression	p. 570
Subset Selection	p. 572
Variable Ranking	p. 575
Overview	p. 577
Traditional Criteria	p. 578
Akaike Information Criterion (AIC)	p. 580
Bayesian Information Criterion (BIC)	p. 583
Choices of Information Criteria	p. 585
Cross Validation	p. 587
Shrinkage Methods	p. 599
Shrinkage Methods for Linear Models	p. 601
Grouping in Variable Selection	p. 615
Least Angle Regression	p. 617
Shrinkage Methods for Model Classes	p. 620
Cautionary Notes	p. 631
Bayes Variable Selection	p. 632
Prior Specification	p. 635
Posterior Calculation and Exploration	p. 643
Evaluating Evidence	p. 647
Connections Between Bayesian and Frequentist Methods	p. 650
Computational Comparisons	p. 653
The n>p Case	p. 653
When p>n	p. 665
Notes	p. 667
Code for Generating Data in Section 10.5	p. 667
Exercises	p. 671
Multiple Testing	p. 679
Analyzing the Hypothesis Testing Problem	p. 681
A Paradigmatic Setting	p. 681
Counts for Multiple Tests	p. 684
Measures of Error in Multiple Testing	p. 685
Aspects of Error Control	p. 687
Controlling the Familywise Error Rate	p. 690
One-Step Adjustments	p. 690
Stepwise p-Value Adjustments	p. 693
PCER and PFER	p. 695
Null Domination	p. 696
Two Procedures	p. 697
Controlling the Type I Error Rate	p. 702
Adjusted p-Values for PFER/PCER	p. 706
Controlling the False Discovery Rate	p. 707
FDR and other Measures of Error	p. 709
The Benjamini-Hochberg Procedure	p. 710
A BH Theorem for a Dependent Setting	p. 711
Variations on BH	p. 713
Controlling the Positive False Discovery Rate	p. 719
Bayesian Interpretations	p. 719
Aspects of Implementation	p. 723
Bayesian Multiple Testing	p. 727
Fully Bayes: Hierarchical	p. 728
Fully Bayes: Decision theory	p. 731
Notes	p. 736
Proof of the Benjamini-Hochberg Theorem	p. 736
Proof of the Benjamini-Yekutieli Theorem	p. 739
References	p. 743
Index	p. 773
Table of Contents provided by Ingram. All Rights Reserved.

Principles and Theory for Data Mining and Machine Learning

At a Glance

Hardcover

Industry Reviews

More...

Shipping

How to return your order

Defective items

You Can Find This Book In

Other Editions and Formats

Paperback

More in Probability & Statistics

The Practice of Statistics in the Life Sciences

4th Edition - Digital Update

Psychology Statistics For Dummies

For Dummies

The Art of Uncertainty

How to Navigate Chance, Ignorance, Risk and Luck

The Art of Statistics

Learning from Data

Mathematical Statistics with Applications

7th Edition

Introductory Econometrics for Finance

4th edition

Statistics Using Stata

3rd Edition - An Integrative Approach

Introduction to Medical Statistics

4th edition

Business Statistics

11th Edition - A Decision Making Approach, Global Edition

The Maths Book

Big Ideas Simply Explained

Calling Bullshit

The Art of Scepticism in a Data-Driven World

The Black Swan

The Impact of the Highly Improbable

Fooled By Randomness

The Hidden Role of Chance in Life and in the Markets

Business Research Methods

14th edition

ISE Business Statistics and Analytics in Practice

9th Edition

Multivariate Data Analysis

8th Edition

A Second Course in Statistics

7th Edition - Regression Analysis

Sampling

3rd Edition - Design and Analysis

Freakonomics Revised and Expanded Edition

A Rogue Economist Explores the Hidden Side of Everything

Essentials of Statistics for the Behavioral Sciences

10th edition

Behavioral Data Analysis with R and Python

Customer-Driven Data for Real Business Results

Statistics Without Tears

An Introduction For Non-Mathematicians

Statistics for The Behavioral Sciences

10th Edition

Think Stats

Exploratory Data Analysis

This product is categorised by