List of Tables | p. xi |
List of Figures | p. xiii |
List of Algorithms | p. xv |
Foreword | p. xvii |
Acknowledgments | p. xix |
Knowledge Discovery from Data Streams | p. 1 |
Introduction | p. 1 |
An Illustrative Example | p. 2 |
A World in Movement | p. 4 |
Data Mining and Data Streams | p. 5 |
Introduction to Data Streams | p. 7 |
Data Stream Models | p. 7 |
Research Issues in Data Stream Management Systems | p. 8 |
An Illustrative Problem | p. 8 |
Basic Streaming Methods | p. 9 |
Illustrative Examples | p. 10 |
Counting the Number of Occurrences of the Elements in a Stream | p. 10 |
Counting the Number of Distinct Values in a Stream | p. 11 |
Bounds of Random Variables | p. 11 |
Poisson Processes | p. 13 |
Maintaining Simple Statistics from Data Streams | p. 14 |
Sliding Windows | p. 14 |
Computing Statistics over Sliding Windows: The ADWIN Algorithm | p. 16 |
Data Synopsis | p. 19 |
Sampling | p. 19 |
Synopsis and Histograms | p. 20 |
Wavelets | p. 21 |
Discrete Fourier Transform | p. 22 |
Illustrative Applications | p. 23 |
A Data Warehouse Problem: Hot-Lists | p. 23 |
Computing the Entropy in a Stream | p. 24 |
Monitoring Correlations Between Data Streams | p. 27 |
Monitoring Threshold Functions over Distributed Data Streams | p. 29 |
Notes | p. 30 |
Change Detection | p. 33 |
Introduction | p. 33 |
Tracking Drifting Concepts | p. 34 |
The Nature of Change | p. 35 |
Characterization of Drift Detection Methods | p. 36 |
Data Management | p. 37 |
Detection Methods | p. 38 |
Adaptation Methods | p. 40 |
Decision Model Management | p. 41 |
A Note on Evaluating Change Detection Methods | p. 41 |
Monitoring the Learning Process | p. 42 |
Drift Detection Using Statistical Process Control | p. 42 |
An Illustrative Example | p. 45 |
Final Remarks | p. 46 |
Notes | p. 47 |
Maintaining Histograms from Data Streams | p. 49 |
Introduction | p. 49 |
Histograms from Data Streams | p. 50 |
K-buckets Histograms | p. 50 |
Exponential Histograms | p. 51 |
An Illustrative Example | p. 52 |
Discussion | p. 52 |
The Partition Incremental Discretization Algorithm - PiD | p. 53 |
Analysis of the Algorithm | p. 56 |
Change Detection in Histograms | p. 56 |
An Illustrative Example | p. 57 |
Applications to Data Mining | p. 59 |
Applying PiD in Supervised Learning | p. 59 |
Time-Changing Environments | p. 61 |
Notes | p. 62 |
Evaluating Streaming Algorithms | p. 63 |
Introduction | p. 63 |
Learning from Data Streams | p. 64 |
Evaluation Issues | p. 65 |
Design of Evaluation Experiments | p. 66 |
Evaluation Metrics | p. 67 |
Error Estimators Using a Single Algorithm and a Single Dataset | p. 68 |
An Illustrative Example | p. 68 |
Comparative Assessment | p. 69 |
The 0 - 1 Loss Function | p. 70 |
Illustrative Example | p. 71 |
Evaluation Methodology in Non-Stationary Environments | p. 72 |
The Page-Hinkley Algorithm | p. 72 |
Illustrative Example | p. 73 |
Lessons Learned and Open Issues | p. 75 |
Notes | p. 77 |
Clustering from Data Streams | p. 79 |
Introduction | p. 79 |
Clustering Examples | p. 80 |
Basic Concepts | p. 80 |
Partitioning Clustering | p. 82 |
The Leader Algorithm | p. 82 |
Single Pass k-Means | p. 82 |
Hierarchical Clustering | p. 83 |
Micro Clustering | p. 85 |
Discussion | p. 86 |
Monitoring Cluster Evolution | p. 86 |
Grid Clustering | p. 87 |
Computing the Fractal Dimension | p. 88 |
Fractal Clustering | p. 88 |
Clustering Variables | p. 90 |
A Hierarchical Approach | p. 91 |
Growing the Hierarchy | p. 91 |
Aggregating at Concept Drift Detection | p. 94 |
Analysis of the Algorithm | p. 96 |
Notes | p. 96 |
Frequent Pattern Mining | p. 97 |
Introduction to Frequent Itemset Mining | p. 97 |
The Search Space | p. 98 |
The FP-growth Algorithm | p. 100 |
Summarizing Itemsets | p. 100 |
Heavy Hitters | p. 101 |
Mining Frequent Itemsets from Data Streams | p. 103 |
Landmark Windows | p. 104 |
The LossyCounting Algorithm | p. 104 |
Frequent Itemsets Using LossyCounting | p. 104 |
Mining Recent Frequent Itemsets | p. 105 |
Maintaining Frequent Itemsets in Sliding Windows | p. 105 |
Mining Closed Frequent Itemsets over Sliding Windows | p. 106 |
Frequent Itemsets at Multiple Time Granularities | p. 108 |
Sequence Pattern Mining | p. 110 |
Reservoir Sampling for Sequential Pattern Mining over Data Streams | p. 111 |
Notes | p. 113 |
Decision Trees from Data Streams | p. 115 |
Introduction | p. 115 |
The Very Fast Decision Tree Algorithm | p. 116 |
VFDT-The Base Algorithm | p. 116 |
Analysis of the VFDT Algorithm | p. 118 |
Extensions to the Basic Algorithm | p. 119 |
Processing Continuous Attributes | p. 119 |
Exhaustive Search | p. 119 |
Discriminant Analysis | p. 121 |
Functional Tree Leaves | p. 123 |
Concept Drift | p. 124 |
Detecting Changes | p. 126 |
Reacting to Changes | p. 127 |
Final Comments | p. 128 |
OLIN: Info-Fuzzy Algorithms | p. 129 |
Notes | p. 132 |
Novelty Detection in Data Streams | p. 133 |
Introduction | p. 133 |
Learning and Novelty | p. 134 |
Desiderata for Novelty Detection | p. 135 |
Novelty Detection as a One-Class Classification Problem | p. 135 |
Autoassociator Networks | p. 136 |
The Positive Naive-Bayes | p. 137 |
Decision Trees for One-Class Classification | p. 138 |
The One-Class SVM | p. 138 |
Evaluation of One-Class Classification Algorithms | p. 139 |
Learning New Concepts | p. 141 |
Approaches Based on Extreme Values | p. 141 |
Approaches Based on the Decision Structure | p. 142 |
Approaches Based on Frequency | p. 143 |
Approaches Based on Distances | p. 144 |
The Online Novelty and Drift Detection Algorithm | p. 144 |
Initial Learning Phase | p. 145 |
Continuous Unsupervised Learning Phase | p. 146 |
Identifying Novel Concepts | p. 147 |
Attempting to Determine the Nature of New Concepts | p. 149 |
Merging Similar Concepts | p. 149 |
Automatically Adapting the Number of Clusters | p. 150 |
Computational Cost | p. 150 |
Notes | p. 151 |
Ensembles of Classifiers | p. 153 |
Introduction | p. 153 |
Linear Combination of Ensembles | p. 155 |
Sampling from a Training Set | p. 156 |
Online Bagging | p. 157 |
Online Boosting | p. 158 |
Ensembles of Trees | p. 160 |
Option Trees | p. 160 |
Forest of Trees | p. 161 |
Generating forest of Trees | p. 162 |
Classifying Test Examples | p. 162 |
Adapting to Drift Using Ensembles of Classifiers | p. 162 |
Mining Skewed Data Streams with Ensembles | p. 165 |
Notes | p. 166 |
Time Series Data Streams | p. 167 |
Introduction to Time Series Analysis | p. 167 |
Trend | p. 167 |
Seasonality | p. 169 |
Stationarity | p. 169 |
Time-Series Prediction | p. 169 |
The Kalman Filter | p. 170 |
Least Mean Squares | p. 173 |
Neural Nets and Data Streams | p. 173 |
Stochastic Sequential Learning of Neural Networks | p. 174 |
Illustrative Example: Load Forecast in Data Streams | p. 175 |
Similarity between Time-Series | p. 177 |
Euclidean Distance | p. 177 |
Dynamic Time-Warping | p. 178 |
Symbolic Approximation-SAX | p. 180 |
The SAX Transform | p. 180 |
Piecewise Aggregate Approximation (PAA) | p. 181 |
Symbolic Discretization | p. 181 |
Distance Measure | p. 182 |
Discussion | p. 182 |
Finding Motifs Using SAX | p. 183 |
Finding Discords Using SAX | p. 183 |
Notes | p. 184 |
Ubiquitous Data Mining | p. 185 |
Introduction to Ubiquitous Data Mining | p. 185 |
Distributed Data Stream Monitoring | p. 186 |
Distributed Computing of Linear Functions | p. 187 |
A General Algorithm for Computing Linear Functions | p. 188 |
Computing Sparse Correlation Matrices Efficiently | p. 189 |
Monitoring Sparse Correlation Matrices | p. 191 |
Detecting Significant Correlations | p. 192 |
Dealing with Data Streams | p. 192 |
Distributed Clustering | p. 193 |
Conquering the Divide | p. 193 |
Furthest Point Clustering | p. 193 |
The Parallel Guessing Clustering | p. 193 |
DGClust - Distributed Grid Clustering | p. 194 |
Local Adaptive Grid | p. 194 |
Frequent State Monitoring | p. 195 |
Centralized Online Clustering | p. 196 |
Algorithm Granularity | p. 197 |
Algorithm Granularity Overview | p. 199 |
Formalization of Algorithm Granularity | p. 200 |
Algorithm Granularity Procedure | p. 200 |
Algorithm Output Granularity | p. 201 |
Notes | p. 203 |
Final Comments | p. 205 |
The Next Generation of Knowledge Discovery | p. 205 |
Mining Spatial Data | p. 206 |
The Time Situation of Data | p. 206 |
Structured Data | p. 206 |
Where We Want to Go | p. 206 |
Resources | p. 209 |
Software | p. 209 |
Datasets | p. 209 |
Bibliography | p. 211 |
Index | p. 235 |
Table of Contents provided by Ingram. All Rights Reserved. |