Steps for exploratory data analysis before implementing feature engineering on a data that is given for machine learning modelling

Girish Kurup
5 min readNov 9, 2020
Freepik.com
FreePik.com

From this post you will learn how to use following Python utilities for exploratory data analysis.

  • dataset.info() for missing data
  • dataset isnull().sum() for missing data
  • sns.countplot() to plot bar chart relation between 2 variables/factors
  • sns.distplot() to plot probability distribution of all the variables in the dataset
  • sns.Facetgrid() and sns.distplot() together to get relationship of more than 3 variables/factors in one go.
  • sns.heatmap() to establish correlation between all the variables/factors in the dataset.

For complex plots and visualization Seaborn is best for data cleansing and exploratory analysis. Seaborn like Pandas rely on matplotlib. From my experience above python utilities are good enough to perfrom a minimal Quality check and exploratory data analysis before embarking on feature engineering or machine learning model.

For more on Seaborn scripts please visit : https://seaborn.pydata.org/examples/index.html

Below example depicts how I used the Python Seaborn for visualization, data quality check and…

--

--

Girish Kurup

Passionate about Writing . I am Technology & DataScience enthusiast. Reach me girishkurup21@gmail.com.