Data Cleaning and Exploration with Machine Learning

Get to grips with machine learning techniques to achieve sparkling-clean data quickly

By: Michael Walker

Write A Review

eText | 26 August 2022 | Edition Number 1

At a Glance

Format
ePUB

eText

$49.49

or 4 interest-free payments of $12.37 with

or

Instant online reading in your Booktopia eTextbook Library *

Get to grips with Machine Learning Techniques to achieve sparkling clean data quickly.

Key Features

Learn how to prepare data for machine learning processes
Understand which algorithms are based on prediction objectives and the properties of the data
Learn how to interpret and evaluate the results from machine learning

Book Description

Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results.

Each model choice is grounded in a full understanding of the underlying data, including in feature importance and correlation, and the distribution of features and targets.

The first two parts introduce the reader to techniques for preparing data for machine learning algorithms, without being bashful about using some machine learning techniques for data cleaning, including anomaly detection and feature selection. The book then applies that general knowledge to a wide variety of machine learning tasks.

The reader will have a good understanding of popular supervised and unsupervised machine learning algorithms, how to prepare data for them, and how to evaluate them.

We have to make room for learning from data that we have been engaged in over the last half century, where our modeling of relationships in the data, and our cleaning and exploration of those data are very much in conversation. We want to retain those habits that have served us well, studying the distribution of variables, identifying anomalies, examining bivariate relationships, even as we focus more and more attention on the accuracy of our predictions.

What you will learn

Readers will learn essential data cleaning and exploration techniques to be used before running the most popular machine learning algorithms
Readers will learn how to do preprocessing and feature selection, and how to set up the data for testing and validation
Readers will learn how to model continuous targets with supervised learning algorithms
Readers will learn how to model class targets with supervised learning algorithms
Readers will learn how to do clustering and dimension reduction with unsupervised learning algorithms

Who This Book Is For

The primary audience for this book are professional data scientists, particularly those in the first few years of their careers, or more experienced Data Scientists who are relatively new to machine learning. Readers should have knowledge of concepts in statistics typically taught in an undergraduate introductory course.They should also have beginner level experience in manipulating data programmatically.