Steps for feature engineering before building machine learning model
After Exploratory data analysis, data Quality checks and addressing missing values data set train data is ready for feature engineering.
Please visit my previous post on exploratory data analysis before using feature engineering descibed in this post.
My below example can be found in Kaggle. Please find the dataset
Following are feature engineering I see in use always to deal with numeric and categorical values
For Data set columns /features/factors/vairables that are numeric:
- Statistical Transformation: using np.log() normalize the numeric variables .If the numeric dataset for vaiable is not following normal distribition we implement log() to normalize the numeric dataset. Log helps in handling sqewed data set.After Log transformation the distribution becomes more approximate to normal.
- Bining: using np.where() encode some of the numeric values. Some times numeric values have so many distributions within
- Encoding: For Data set columns /features/factors/vairables that are Categorical:using pd.get_dummies() Dummy Encoding of the Categorical Values and dont forget to change the datatype to int16
Lets revist the Dataset that we worked on previous post: exploratory data analysis. 7 Columns Dataset.