Steps for feature engineering before building machine learning model

Girish Kurup
2 min readNov 12, 2020

After Exploratory data analysis, data Quality checks and addressing missing values data set train data is ready for feature engineering.

Please visit my previous post on exploratory data analysis before using feature engineering descibed in this post.

My below example can be found in Kaggle. Please find the dataset

Following are feature engineering I see in use always to deal with numeric and categorical values

For Data set columns /features/factors/vairables that are numeric:

  • Statistical Transformation: using np.log() normalize the numeric variables .If the numeric dataset for vaiable is not following normal distribition we implement log() to normalize the numeric dataset. Log helps in handling sqewed data set.After Log transformation the distribution becomes more approximate to normal.
  • Bining: using np.where() encode some of the numeric values. Some times numeric values have so many distributions within
  • Encoding: For Data set columns /features/factors/vairables that are Categorical:using pd.get_dummies() Dummy Encoding of the Categorical Values and dont forget to change the datatype to int16

Lets revist the Dataset that we worked on previous post: exploratory data analysis. 7 Columns Dataset.

--

--

Girish Kurup
Girish Kurup

Written by Girish Kurup

Passionate about Writing . I am Technology & DataScience enthusiast. Reach me girishkurup21@gmail.com.