Get to grips with Machine Learning Techniques to achieve sparkling clean data quickly.
Key Features
- Learn how to prepare data for machine learning processes
- Understand which algorithms are based on prediction objectives and the properties of the data
- Learn how to interpret and evaluate the results from machine learning
Book Description
Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results.
Each model choice is grounded in a full understanding of the underlying data, including in feature importance and correlation, and the distribution of features and targets.
The first two parts introduce the reader to techniques for preparing data for machine learning algorithms, without being bashful about using some machine learning techniques for data cleaning, including anomaly detection and feature selection. The book then applies that general knowledge to a wide variety of machine learning tasks.
The reader will have a good understanding of popular supervised and unsupervised machine learning algorithms, how to prepare data for them, and how to evaluate them.
We have to make room for learning from data that we have been engaged in over the last half century, where our modeling of relationships in the data, and our cleaning and exploration of those data are very much in conversation. We want to retain those habits that have served us well, studying the distribution of variables, identifying anomalies, examining bivariate relationships, even as we focus more and more attention on the accuracy of our predictions.
What you will learn
- Readers will learn essential data cleaning and exploration techniques to be used before running the most popular machine learning algorithms
- Readers will learn how to do preprocessing and feature selection, and how to set up the data for testing and validation
- Readers will learn how to model continuous targets with supervised learning algorithms
- Readers will learn how to model class targets with supervised learning algorithms
- Readers will learn how to do clustering and dimension reduction with unsupervised learning algorithms
Who This Book Is For
The primary audience for this book are professional data scientists, particularly those in the first few years of their careers, or more experienced Data Scientists who are relatively new to machine learning. Readers should have knowledge of concepts in statistics typically taught in an undergraduate introductory course.They should also have beginner level experience in manipulating data programmatically.
Table of Contents
- Examining the Distribution of Features and Targets
- Examining Bivariate Relationships Between Features and Targets
- Identifying and Fixing Missing Values
- Encoding, Transforming, and Rescaling Features
- Feature Correlation and Selection
- Preparing for Model Validation
- Regression Models
- Support Vector Regression Models
- K-Nearest Neighbor Regression and Regression Tree Models
- Logistic Regression Models
- Classification Tree and Random Forest Models
- K-Nearest Neighbor Classification Models
- Support Vector Classification
- Naive Bayes Models
- Principal Component Analysis
- K-Means Clustering