Steps for exploratory data analysis before implementing feature engineering on a data that is given for machine learning modelling

5 min readNov 9, 2020

From this post you will learn how to use following Python utilities for exploratory data analysis.

dataset.info() for missing data
dataset isnull().sum() for missing data
sns.countplot() to plot bar chart relation between 2 variables/factors
sns.distplot() to plot probability distribution of all the variables in the dataset
sns.Facetgrid() and sns.distplot() together to get relationship of more than 3 variables/factors in one go.
sns.heatmap() to establish correlation between all the variables/factors in the dataset.

For complex plots and visualization Seaborn is best for data cleansing and exploratory analysis. Seaborn like Pandas rely on matplotlib. From my experience above python utilities are good enough to perfrom a minimal Quality check and exploratory data analysis before embarking on feature engineering or machine learning model.

For more on Seaborn scripts please visit : https://seaborn.pydata.org/examples/index.html

Below example depicts how I used the Python Seaborn for visualization, data quality check and…

Steps for exploratory data analysis before implementing feature engineering on a data that is given for machine learning modelling

Written by Girish Kurup

Responses (1)