Get Free Shipping on orders over $79
Data Cleaning and Exploration with Machine Learning : Get to grips with machine learning techniques to achieve sparkling-clean data quickly - Michael Walker

Data Cleaning and Exploration with Machine Learning

Get to grips with machine learning techniques to achieve sparkling-clean data quickly

By: Michael Walker

eText | 26 August 2022 | Edition Number 1

At a Glance

eText


$49.49

or 4 interest-free payments of $12.37 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Get to grips with Machine Learning Techniques to achieve sparkling clean data quickly.

Key Features

  • Learn how to prepare data for machine learning processes
  • Understand which algorithms are based on prediction objectives and the properties of the data
  • Learn how to interpret and evaluate the results from machine learning

Book Description

Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results.

Each model choice is grounded in a full understanding of the underlying data, including in feature importance and correlation, and the distribution of features and targets.

The first two parts introduce the reader to techniques for preparing data for machine learning algorithms, without being bashful about using some machine learning techniques for data cleaning, including anomaly detection and feature selection. The book then applies that general knowledge to a wide variety of machine learning tasks.

The reader will have a good understanding of popular supervised and unsupervised machine learning algorithms, how to prepare data for them, and how to evaluate them.

We have to make room for learning from data that we have been engaged in over the last half century, where our modeling of relationships in the data, and our cleaning and exploration of those data are very much in conversation. We want to retain those habits that have served us well, studying the distribution of variables, identifying anomalies, examining bivariate relationships, even as we focus more and more attention on the accuracy of our predictions.

What you will learn

  • Readers will learn essential data cleaning and exploration techniques to be used before running the most popular machine learning algorithms
  • Readers will learn how to do preprocessing and feature selection, and how to set up the data for testing and validation
  • Readers will learn how to model continuous targets with supervised learning algorithms
  • Readers will learn how to model class targets with supervised learning algorithms
  • Readers will learn how to do clustering and dimension reduction with unsupervised learning algorithms

Who This Book Is For

The primary audience for this book are professional data scientists, particularly those in the first few years of their careers, or more experienced Data Scientists who are relatively new to machine learning. Readers should have knowledge of concepts in statistics typically taught in an undergraduate introductory course.They should also have beginner level experience in manipulating data programmatically.

Table of Contents

  1. Examining the Distribution of Features and Targets
  2. Examining Bivariate Relationships Between Features and Targets
  3. Identifying and Fixing Missing Values
  4. Encoding, Transforming, and Rescaling Features
  5. Feature Correlation and Selection
  6. Preparing for Model Validation
  7. Regression Models
  8. Support Vector Regression Models
  9. K-Nearest Neighbor Regression and Regression Tree Models
  10. Logistic Regression Models
  11. Classification Tree and Random Forest Models
  12. K-Nearest Neighbor Classification Models
  13. Support Vector Classification
  14. Naive Bayes Models
  15. Principal Component Analysis
  16. K-Means Clustering
on
Desktop
Tablet
Mobile

More in Data Capture & Analysis

China's Megatrends : The 8 Pillars of a New Society - John Naisbitt

eBOOK

AI-Powered Search - Trey Grainger

eBOOK

Transformers in Action - Nicole Koenigstein

eBOOK