
DATA MINING WITH DECISION TREES (V69)
Theory and Applications
Hardcover | 29 December 2007
At a Glance
264 Pages
22.86 x 15.24 x 1.6
Hardcover
$273.63
or 4 interest-free payments of $68.41 with
orAims to ship in 7 to 10 business days
When will this arrive by?
Enter delivery postcode to estimate
This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique.Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:
Preface | p. vii |
Introduction to Decision Trees | p. 1 |
Data Mining and Knowledge Discovery | p. 1 |
Taxonomy of Data Mining Methods | p. 3 |
Supervised Methods | p. 4 |
Overview | p. 4 |
Classification Trees | p. 5 |
Characteristics of Classification Trees | p. 8 |
Tree Size | p. 9 |
The Hierarchical Nature of Decision Trees | p. 10 |
Relation to Rule Induction | p. 11 |
Growing Decision Trees | p. 13 |
Training Set | p. 13 |
Definition of the Classification Problem | p. 14 |
Induction Algorithms | p. 16 |
Probability Estimation in Decision Trees | p. 16 |
Laplace Correction | p. 17 |
No Match | p. 18 |
Algorithmic Framework for Decision Trees | p. 18 |
Stopping Criteria | p. 19 |
Evaluation of Classification Trees | p. 21 |
Overview | p. 21 |
Generalization Error | p. 21 |
Theoretical Estimation of Generalization Error | p. 22 |
Empirical Estimation of Generalization Error | p. 23 |
Alternatives to the Accuracy Measure | p. 24 |
The F-Measure | p. 25 |
Confusion Matrix | p. 27 |
Classifier Evaluation under Limited Resources | p. 28 |
ROC Curves | p. 30 |
Hit Rate Curve | p. 30 |
Qrecall (Quota Recall) | p. 32 |
Lift Curve | p. 32 |
Pearson Correlation Coefficient | p. 32 |
Area Under Curve (AUC) | p. 34 |
Average Hit Rate | p. 35 |
Average Qrecall | p. 35 |
Potential Extract Measure (PEM) | p. 36 |
Which Decision Tree Classifier is Better? | p. 40 |
McNemar's Test | p. 40 |
A Test for the Difference of Two Proportions | p. 41 |
The Resampled Paired t Test | p. 43 |
The k-fold Cross-validated Paired t Test | p. 43 |
Computational Complexity | p. 44 |
Comprehensibility | p. 44 |
Scalability to Large Datasets | p. 45 |
Robustness | p. 47 |
Stability | p. 47 |
Interestingness Measures | p. 48 |
Overfitting and Underfitting | p. 49 |
"No Free Lunch" Theorem | p. 50 |
Splitting Criteria | p. 53 |
Univariate Splitting Criteria | p. 53 |
Overview | p. 53 |
Impurity based Criteria | p. 53 |
Information Gain | p. 54 |
Gini Index | p. 55 |
Likelihood Ratio Chi-squared Statistics | p. 55 |
DKM Criterion | p. 55 |
Normalized Impurity-based Criteria | p. 56 |
Gain Ratio | p. 56 |
Distance Measure | p. 56 |
Binary Criteria | p. 57 |
Twoing Criterion | p. 57 |
Orthogonal Criterion | p. 58 |
Kolmogorov-Smirnov Criterion | p. 58 |
AUC Splitting Criteria | p. 58 |
Other Univariate Splitting Criteria | p. 59 |
Comparison of Univariate Splitting Criteria | p. 59 |
Handling Missing Values | p. 59 |
Pruning Trees | p. 63 |
Stopping Criteria | p. 63 |
Heuristic Pruning | p. 63 |
Overview | p. 63 |
Cost Complexity Pruning | p. 64 |
Reduced Error Pruning | p. 65 |
Minimum Error Pruning (MEP) | p. 65 |
Pessimistic Pruning | p. 65 |
Error-Based Pruning (EBP) | p. 66 |
Minimum Description Length (MDL) Pruning | p. 67 |
Other Pruning Methods | p. 67 |
Comparison of Pruning Methods | p. 68 |
Optimal Pruning | p. 68 |
Advanced Decision Trees | p. 71 |
Survey of Common Algorithms for Decision Tree Induction | p. 71 |
ID3 | p. 71 |
C4.5 | p. 71 |
CART | p. 71 |
CHAID | p. 72 |
QUEST | p. 73 |
Reference to Other Algorithms | p. 73 |
Advantages and Disadvantages of Decision Trees | p. 73 |
Oblivious Decision Trees | p. 76 |
Decision Trees Inducers for Large Datasets | p. 78 |
Online Adaptive Decision Trees | p. 79 |
Lazy Tree | p. 79 |
Option Tree | p. 80 |
Lookahead | p. 82 |
Oblique Decision Trees | p. 83 |
Decision Forests | p. 87 |
Overview | p. 87 |
Introduction | p. 87 |
Combination Methods | p. 90 |
Weighting Methods | p. 90 |
Majority Voting | p. 90 |
Performance Weighting | p. 91 |
Distribution Summation | p. 91 |
Bayesian Combination | p. 91 |
Dempster-Shafer | p. 92 |
Vogging | p. 92 |
Naive Bayes | p. 93 |
Entropy Weighting | p. 93 |
Density-based Weighting | p. 93 |
DEA Weighting Method | p. 93 |
Logarithmic Opinion Pool | p. 94 |
Gating Network | p. 94 |
Order Statistics | p. 95 |
Meta-combination Methods | p. 95 |
Stacking | p. 95 |
Arbiter Trees | p. 97 |
Combiner Trees | p. 99 |
Grading | p. 100 |
Classifier Dependency | p. 101 |
Dependent Methods | p. 101 |
Model-guided Instance Selection | p. 101 |
Incremental Batch Learning | p. 105 |
Independent Methods | p. 105 |
Bagging | p. 105 |
Wagging | p. 107 |
Random Forest | p. 108 |
Cross-validated Committees | p. 109 |
Ensemble Diversity | p. 109 |
Manipulating the Inducer | p. 110 |
Manipulation of the Inducer's Parameters | p. 111 |
Starting Point in Hypothesis Space | p. 111 |
Hypothesis Space Traversal | p. 111 |
Manipulating the Training Samples | p. 112 |
Resampling | p. 112 |
Creation | p. 113 |
Partitioning | p. 113 |
Manipulating the Target Attribute Representation | p. 114 |
Partitioning the Search Space | p. 115 |
Divide and Conquer | p. 116 |
Feature Subset-based Ensemble Methods | p. 117 |
Multi-Inducers | p. 121 |
Measuring the Diversity | p. 122 |
Ensemble Size | p. 124 |
Selecting the Ensemble Size | p. 124 |
Pre Selection of the Ensemble Size | p. 124 |
Selection of the Ensemble Size while Training | p. 125 |
Pruning - Post Selection of the Ensemble Size | p. 125 |
Pre-combining Pruning | p. 126 |
Post-combining Pruning | p. 126 |
Cross-Inducer | p. 127 |
Multistrategy Ensemble Learning | p. 127 |
Which Ensemble Method Should be Used? | p. 128 |
Open Source for Decision Trees Forests | p. 128 |
Incremental Learning of Decision Trees | p. 131 |
Overview | p. 131 |
The Motives for Incremental Learning | p. 131 |
The Inefficiency Challenge | p. 132 |
The Concept Drift Challenge | p. 133 |
Feature Selection | p. 137 |
Overview | p. 137 |
The "Curse of Dimensionality" | p. 137 |
Techniques for Feature Selection | p. 140 |
Feature Filters | p. 141 |
FOCUS | p. 141 |
LVF | p. 141 |
Using One Learning Algorithm as a Filter for Another | p. 141 |
An Information Theoretic Feature Filter | p. 142 |
An Instance Based Approach to Feature Selection - RELIEF | p. 142 |
Simba and G-flip | p. 142 |
Contextual Merit Algorithm | p. 143 |
Using Traditional Statistics for Filtering | p. 143 |
Mallows Cp | p. 143 |
AIC, BIC and F-ratio | p. 144 |
Principal Component Analysis (PCA) | p. 144 |
Factor Analysis (FA) | p. 145 |
Projection Pursuit | p. 145 |
Wrappers | p. 145 |
Wrappers for Decision Tree Learners | p. 145 |
Feature Selection as a Means of Creating Ensembles | p. 146 |
Ensemble Methodology as a Means for Improving Feature Selection | p. 147 |
Independent Algorithmic Framework | p. 149 |
Combining Procedure | p. 150 |
Simple Weighted Voting | p. 151 |
Naive Bayes Weighting using Artificial Contrasts | p. 152 |
Feature Ensemble Generator | p. 154 |
Multiple Feature Selectors | p. 154 |
Bagging | p. 156 |
Using Decision Trees for Feature Selection | p. 156 |
Limitation of Feature Selection Methods | p. 157 |
Fuzzy Decision Trees | p. 159 |
Overview | p. 159 |
Membership Function | p. 160 |
Fuzzy Classification Problems | p. 161 |
Fuzzy Set Operations | p. 163 |
Fuzzy Classification Rules | p. 164 |
Creating Fuzzy Decision Tree | p. 164 |
Fuzzifying Numeric Attributes | p. 165 |
Inducing of Fuzzy Decision Tree | p. 166 |
Simplifying the Decision Tree | p. 169 |
Classification of New Instances | p. 169 |
Other Fuzzy Decision Tree Inducers | p. 169 |
Hybridization of Decision Trees with other Techniques | p. 171 |
Introduction | p. 171 |
A Decision Tree Framework for Instance-Space Decomposition | p. 171 |
Stopping Rules | p. 174 |
Splitting Rules | p. 175 |
Split Validation Examinations | p. 175 |
The CPOM Algorithm | p. 176 |
CPOM Outline | p. 176 |
The Grouped Gain Ratio Splitting Rule | p. 177 |
Induction of Decision Trees by an Evolutionary Algorithm | p. 179 |
Sequence Classification Using Decision Trees | p. 187 |
Introduction | p. 187 |
Sequence Representation | p. 187 |
Pattern Discovery | p. 188 |
Pattern Selection | p. 190 |
Heuristics for Pattern Selection | p. 190 |
Correlation based Feature Selection | p. 191 |
Classifier Training | p. 191 |
Adjustment of Decision Trees | p. 192 |
Cascading Decision Trees | p. 192 |
Application of CREDT in Improving of Information Retrieval of Medical Narrative Reports | p. 193 |
Related Works | p. 195 |
Text Classification | p. 195 |
Part-of-speech Tagging | p. 198 |
Frameworks for Information Extraction | p. 198 |
Frameworks for Labeling Sequential Data | p. 199 |
Identifying Negative Context in Nondomain Specific Text (General NLP) | p. 199 |
Identifying Negative Context in Medical Narratives | p. 200 |
Works Based on Knowledge Engineering | p. 200 |
Works based on Machine Learning | p. 201 |
Using CREDT for Solving the Negation Problem | p. 201 |
The Process Overview | p. 201 |
Step 1: Corpus Preparation | p. 201 |
Step 1.1: Tagging | p. 202 |
Step 1.2: Sentence Boundaries | p. 202 |
Step 1.3: Manual Labeling | p. 203 |
Step 2: Patterns Creation | p. 203 |
Step 3: Patterns Selection | p. 206 |
Step 4: Classifier Training | p. 208 |
Cascade of Three Classifiers | p. 209 |
Bibliography | p. 215 |
Index | p. 243 |
Table of Contents provided by Ingram. All Rights Reserved. |
ISBN: 9789812771711
ISBN-10: 9812771719
Series: Machine Perception and Artificial Intelligence
Published: 29th December 2007
Format: Hardcover
Language: English
Number of Pages: 264
Audience: College, Tertiary and University
Publisher: World Scientific Publishing Co Pte Ltd
Country of Publication: SG
Dimensions (cm): 22.86 x 15.24 x 1.6
Weight (kg): 0.51
Shipping
Standard Shipping | Express Shipping | |
---|---|---|
Metro postcodes: | $9.99 | $14.95 |
Regional postcodes: | $9.99 | $14.95 |
Rural postcodes: | $9.99 | $14.95 |
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.