Knowledge Discovery from Data Streams

By: Joao Gama

Write A Review

Hardcover | 25 May 2010 | Edition Number 1

At a Glance

Hardcover
258 Pages

Dimensions(cm)
23.39 x 15.6 x 1.6

Hardcover

RRP $183.00

$134.25

27%OFF

or 4 interest-free payments of $33.56 with

Aims to ship in 7 to 10 business days

Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.

The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.

This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

Industry Reviews

"!Gama is one of the leading investigators in the hottest research topic in machine learning and data mining: data streams. ! This book is the first book to didactically cover in a clear, comprehensive and mathematically rigorous way the main machine learning related aspects of this relevant research field. ! an up-to-date, broad and useful source of reference for all those interested in knowledge acquisition by learning techniques." --From the Foreword by Andre Ponce de Leon Ferreira de Carvalho, University of Sao Paulo, Brazil

Shipping

	Standard Shipping	Express Shipping
Metro postcodes:	$9.99	$14.95
Regional postcodes:	$9.99	$14.95
Rural postcodes:	$9.99	$14.95

How to return your order

At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.

Additional postage charges may be applicable.

Defective items

If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.

For more info please visit our Help Centre.

You Can Find This Book In

Non-Fiction Computing & I.T.Databases Data Mining Economics Econometrics Economic Statistics Computer Programming & Software Development Algorithms & Data Structures Computer Science Digital Lifestyle & Online World: Consumer & User Guides Computer Games & Online Games Strategy Guides

Mathematics Probability & Statistics Engineering & Technology Electronics & Communications Engineering Electronics Engineering Automatic Control Engineering Environmental Science

List of Tables	p. xi
List of Figures	p. xiii
List of Algorithms	p. xv
Foreword	p. xvii
Acknowledgments	p. xix
Knowledge Discovery from Data Streams	p. 1
Introduction	p. 1
An Illustrative Example	p. 2
A World in Movement	p. 4
Data Mining and Data Streams	p. 5
Introduction to Data Streams	p. 7
Data Stream Models	p. 7
Research Issues in Data Stream Management Systems	p. 8
An Illustrative Problem	p. 8
Basic Streaming Methods	p. 9
Illustrative Examples	p. 10
Counting the Number of Occurrences of the Elements in a Stream	p. 10
Counting the Number of Distinct Values in a Stream	p. 11
Bounds of Random Variables	p. 11
Poisson Processes	p. 13
Maintaining Simple Statistics from Data Streams	p. 14
Sliding Windows	p. 14
Computing Statistics over Sliding Windows: The ADWIN Algorithm	p. 16
Data Synopsis	p. 19
Sampling	p. 19
Synopsis and Histograms	p. 20
Wavelets	p. 21
Discrete Fourier Transform	p. 22
Illustrative Applications	p. 23
A Data Warehouse Problem: Hot-Lists	p. 23
Computing the Entropy in a Stream	p. 24
Monitoring Correlations Between Data Streams	p. 27
Monitoring Threshold Functions over Distributed Data Streams	p. 29
Notes	p. 30
Change Detection	p. 33
Introduction	p. 33
Tracking Drifting Concepts	p. 34
The Nature of Change	p. 35
Characterization of Drift Detection Methods	p. 36
Data Management	p. 37
Detection Methods	p. 38
Adaptation Methods	p. 40
Decision Model Management	p. 41
A Note on Evaluating Change Detection Methods	p. 41
Monitoring the Learning Process	p. 42
Drift Detection Using Statistical Process Control	p. 42
An Illustrative Example	p. 45
Final Remarks	p. 46
Notes	p. 47
Maintaining Histograms from Data Streams	p. 49
Introduction	p. 49
Histograms from Data Streams	p. 50
K-buckets Histograms	p. 50
Exponential Histograms	p. 51
An Illustrative Example	p. 52
Discussion	p. 52
The Partition Incremental Discretization Algorithm - PiD	p. 53
Analysis of the Algorithm	p. 56
Change Detection in Histograms	p. 56
An Illustrative Example	p. 57
Applications to Data Mining	p. 59
Applying PiD in Supervised Learning	p. 59
Time-Changing Environments	p. 61
Notes	p. 62
Evaluating Streaming Algorithms	p. 63
Introduction	p. 63
Learning from Data Streams	p. 64
Evaluation Issues	p. 65
Design of Evaluation Experiments	p. 66
Evaluation Metrics	p. 67
Error Estimators Using a Single Algorithm and a Single Dataset	p. 68
An Illustrative Example	p. 68
Comparative Assessment	p. 69
The 0 - 1 Loss Function	p. 70
Illustrative Example	p. 71
Evaluation Methodology in Non-Stationary Environments	p. 72
The Page-Hinkley Algorithm	p. 72
Illustrative Example	p. 73
Lessons Learned and Open Issues	p. 75
Notes	p. 77
Clustering from Data Streams	p. 79
Introduction	p. 79
Clustering Examples	p. 80
Basic Concepts	p. 80
Partitioning Clustering	p. 82
The Leader Algorithm	p. 82
Single Pass k-Means	p. 82
Hierarchical Clustering	p. 83
Micro Clustering	p. 85
Discussion	p. 86
Monitoring Cluster Evolution	p. 86
Grid Clustering	p. 87
Computing the Fractal Dimension	p. 88
Fractal Clustering	p. 88
Clustering Variables	p. 90
A Hierarchical Approach	p. 91
Growing the Hierarchy	p. 91
Aggregating at Concept Drift Detection	p. 94
Analysis of the Algorithm	p. 96
Notes	p. 96
Frequent Pattern Mining	p. 97
Introduction to Frequent Itemset Mining	p. 97
The Search Space	p. 98
The FP-growth Algorithm	p. 100
Summarizing Itemsets	p. 100
Heavy Hitters	p. 101
Mining Frequent Itemsets from Data Streams	p. 103
Landmark Windows	p. 104
The LossyCounting Algorithm	p. 104
Frequent Itemsets Using LossyCounting	p. 104
Mining Recent Frequent Itemsets	p. 105
Maintaining Frequent Itemsets in Sliding Windows	p. 105
Mining Closed Frequent Itemsets over Sliding Windows	p. 106
Frequent Itemsets at Multiple Time Granularities	p. 108
Sequence Pattern Mining	p. 110
Reservoir Sampling for Sequential Pattern Mining over Data Streams	p. 111
Notes	p. 113
Decision Trees from Data Streams	p. 115
Introduction	p. 115
The Very Fast Decision Tree Algorithm	p. 116
VFDT-The Base Algorithm	p. 116
Analysis of the VFDT Algorithm	p. 118
Extensions to the Basic Algorithm	p. 119
Processing Continuous Attributes	p. 119
Exhaustive Search	p. 119
Discriminant Analysis	p. 121
Functional Tree Leaves	p. 123
Concept Drift	p. 124
Detecting Changes	p. 126
Reacting to Changes	p. 127
Final Comments	p. 128
OLIN: Info-Fuzzy Algorithms	p. 129
Notes	p. 132
Novelty Detection in Data Streams	p. 133
Introduction	p. 133
Learning and Novelty	p. 134
Desiderata for Novelty Detection	p. 135
Novelty Detection as a One-Class Classification Problem	p. 135
Autoassociator Networks	p. 136
The Positive Naive-Bayes	p. 137
Decision Trees for One-Class Classification	p. 138
The One-Class SVM	p. 138
Evaluation of One-Class Classification Algorithms	p. 139
Learning New Concepts	p. 141
Approaches Based on Extreme Values	p. 141
Approaches Based on the Decision Structure	p. 142
Approaches Based on Frequency	p. 143
Approaches Based on Distances	p. 144
The Online Novelty and Drift Detection Algorithm	p. 144
Initial Learning Phase	p. 145
Continuous Unsupervised Learning Phase	p. 146
Identifying Novel Concepts	p. 147
Attempting to Determine the Nature of New Concepts	p. 149
Merging Similar Concepts	p. 149
Automatically Adapting the Number of Clusters	p. 150
Computational Cost	p. 150
Notes	p. 151
Ensembles of Classifiers	p. 153
Introduction	p. 153
Linear Combination of Ensembles	p. 155
Sampling from a Training Set	p. 156
Online Bagging	p. 157
Online Boosting	p. 158
Ensembles of Trees	p. 160
Option Trees	p. 160
Forest of Trees	p. 161
Generating forest of Trees	p. 162
Classifying Test Examples	p. 162
Adapting to Drift Using Ensembles of Classifiers	p. 162
Mining Skewed Data Streams with Ensembles	p. 165
Notes	p. 166
Time Series Data Streams	p. 167
Introduction to Time Series Analysis	p. 167
Trend	p. 167
Seasonality	p. 169
Stationarity	p. 169
Time-Series Prediction	p. 169
The Kalman Filter	p. 170
Least Mean Squares	p. 173
Neural Nets and Data Streams	p. 173
Stochastic Sequential Learning of Neural Networks	p. 174
Illustrative Example: Load Forecast in Data Streams	p. 175
Similarity between Time-Series	p. 177
Euclidean Distance	p. 177
Dynamic Time-Warping	p. 178
Symbolic Approximation-SAX	p. 180
The SAX Transform	p. 180
Piecewise Aggregate Approximation (PAA)	p. 181
Symbolic Discretization	p. 181
Distance Measure	p. 182
Discussion	p. 182
Finding Motifs Using SAX	p. 183
Finding Discords Using SAX	p. 183
Notes	p. 184
Ubiquitous Data Mining	p. 185
Introduction to Ubiquitous Data Mining	p. 185
Distributed Data Stream Monitoring	p. 186
Distributed Computing of Linear Functions	p. 187
A General Algorithm for Computing Linear Functions	p. 188
Computing Sparse Correlation Matrices Efficiently	p. 189
Monitoring Sparse Correlation Matrices	p. 191
Detecting Significant Correlations	p. 192
Dealing with Data Streams	p. 192
Distributed Clustering	p. 193
Conquering the Divide	p. 193
Furthest Point Clustering	p. 193
The Parallel Guessing Clustering	p. 193
DGClust - Distributed Grid Clustering	p. 194
Local Adaptive Grid	p. 194
Frequent State Monitoring	p. 195
Centralized Online Clustering	p. 196
Algorithm Granularity	p. 197
Algorithm Granularity Overview	p. 199
Formalization of Algorithm Granularity	p. 200
Algorithm Granularity Procedure	p. 200
Algorithm Output Granularity	p. 201
Notes	p. 203
Final Comments	p. 205
The Next Generation of Knowledge Discovery	p. 205
Mining Spatial Data	p. 206
The Time Situation of Data	p. 206
Structured Data	p. 206
Where We Want to Go	p. 206
Resources	p. 209
Software	p. 209
Datasets	p. 209
Bibliography	p. 211
Index	p. 235
Table of Contents provided by Ingram. All Rights Reserved.

Knowledge Discovery from Data Streams

At a Glance

Hardcover

Industry Reviews

Shipping

How to return your order

Defective items

You Can Find This Book In

More in Economic Statistics

Contemporary Project Management

5th Edition - Plan-Driven and Agile Approaches

Fundamentals of Anatomy & Physiology, Global Edition + Martini's Atlas of the Human Body + Mastering A &P with Pearson eText

Learn Python 3 the Hard Way

A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code

Sampling

3rd Edition - Design and Analysis

Quantitative Methods for Business (Custom Edition)

3rd Edition

Managing for Quality and Performance Excellence

12th Edition

Spreadsheet Modeling & Decision Analysis 9ed

A Practical Introduction to Business Analytics

Principles of Economics

4th Edition

Introduction to Econometrics

5th edition

Principles of Human Physiology, Global Edition

6th edition

Fundamentals of Complex Analysis with Applications to Engineering, Science, and Mathematics

3rd Edition - Pearson New International Edition

Introductory Econometrics for Finance

4th edition

Basic Business Statistics + PHStat for Statistics

5th Edition

Statistics for Managers Using Microsoft Excel, Global Edition

9th Edition

Accounting

9th Edition

Business Statistics

4th Global Edition

Microeconomics

3rd Global Edition

An Introduction to Management Science

Quantitative Approaches to Decision Making

Transportation

10th Edition - A Global Supply Chain Perspective

Behavioral Data Analysis with R and Python

Customer-Driven Data for Real Business Results

Causal Inference in Python

Applying Causal Inference in the Tech Industry

Multiple Regression and Beyond

3rd Edition - An Introduction to Multiple Regression and Structural Equation Modeling

Alteryx Designer: The Definitive Guide

Simplify and Automate Your Analytics

Business Analytics

Applied Modelling and Prediction

This product is categorised by