Foreword xxxiii
Introduction xxxvii
Chapter 1 A Data Miner Looks at SQL 1
Databases, SQL, and Big Data 2
Picturing the Structure of the Data 6
Picturing Data Analysis Using Dataflows 16
SQL Queries 21
Subqueries and Common Table Expressions Are Our Friends 36
Lessons Learned 47
Chapter 2 What’s in a Table? Getting Started with Data Exploration 49
What Is Data Exploration? 50
Excel for Charting 51
Sparklines 65
What Values Are in the Columns? 68
More Values to Explore—Min, Max, and Mode 79
Exploring String Values 81
Exploring Values in Two Columns 86
From Summarizing One Column to Summarizing All Columns 90
Lessons Learned 96
Chapter 3 How Different Is Different? 97
Basic Statistical Concepts 98
How Different Are the Averages? 105
Sampling from a Table 110
Counting Possibilities 115
Ratios and Their Statistics 128
Chi-Square 132
What Months and Payment Types Have Unusual Affinities for Which Types of Products? 140
Lessons Learned 143
Chapter 4 Where Is It All Happening? Location, Location, Location 145
Latitude and Longitude 146
Census Demographics 160
Geographic Hierarchies 172
Mapping in Excel 188
Lessons Learned 194
Chapter 5 It’s a Matter of Time 197
Dates and Times in Databases 198
Starting to Investigate Dates 204
How Long Between Two Dates? 218
Year-over-Year Comparisons 229
Counting Active Customers by Day 239
Simple Chart Animation in Excel 247
Lessons Learned 254
Chapter 6 How Long Will Customers Last? Survival Analysis to Understand Customers and Their Value 255
Background on Survival Analysis 256
The Hazard Calculation 260
Survival and Retention 269
Comparing Different Groups of Customers 280
Comparing Survival over Time 287
Important Measures Derived from Survival 293
Using Survival for Customer Value Calculations 298
Forecasting 308
Lessons Learned 314
Chapter 7 Factors Affecting Survival: The What and Why of Customer Tenure 315
Which Factors Are Important and When 316
Left Truncation 328
Time Windowing 336
Competing Risks 342
Before and After 353
Lessons Learned 366
Chapter 8 Customer Purchases and Other Repeated Events 367
Identifying Customers 368
RFM Analysis 393
Which Households Are Increasing Purchase Amounts Over Time? 404
Time to Next Event 416
Lessons Learned 420
Chapter 9 What’s in a Shopping Cart? Market Basket Analysis 421
Exploring the Products 422
Products and Customer Worth 437
Product Geographic Distribution 448
Which Customers Have Particular Products? 451
Lessons Learned 463
Chapter 10 Association Rules and Beyond 465
Item Sets 466
The Simplest Association Rules 480
One-Way Association Rules 483
Two-Way Associations 489
Extending Association Rules 499
Lessons Learned 506
Chapter 11 Data Mining Models in SQL 507
Introduction to Directed Data Mining 508
Look-Alike Models 515
Lookup Model for Most Popular Product 522
Lookup Model for Order Size 528
Lookup Model for Probability of Response 534
Naive Bayesian Models (Evidence Models) 546
Lessons Learned 559
Chapter 12 The Best-Fit Line: Linear Regression Models 561
The Best-Fit Line 562
Measuring Goodness of Fit Using R2 581
Direct Calculation of Best-Fit Line Coefficients 584
Weighted Linear Regression 592
More Than One Input Variable 600
Lessons Learned 607
Chapter 13 Building Customer Signatures for Further Analysis 609
What Is a Customer Signature? 610
Designing Customer Signatures 617
Operations to Build Customer Signatures 622
Extracting Features 639
Summarizing Customer Behaviors 644
Lessons Learned 653
Chapter 14 Performance Is the Issue: Using SQL Effectively 655
Query Engines and Performance 656
Considerations When Thinking About Performance 660
Performance: Its Meaning and Measurement 663
Performance Improvement 101 665
Using Indexes Effectively 668
When OR Is a Bad Thing 683
Pros and Cons: Different Ways of Expressing the Same Thing 686
Window Functions 694
Lessons Learned 701
Appendix Equivalent Constructs Among Databases 703
Index 731