Dan | Machine Learning Engineer
Dan | Machine Learning Engineer

@DanKornas

12 Tweets Dec 07, 2022
πŸ“š Feature Engineering in Python πŸ“š
Original article by @engelAnalytics from @TDataScience
#datascience #machinelearning
🧡 A quick summary πŸ‘‡
@engelAnalytics @TDataScience πŸ”ΈMost of the value as a data scientist comes from the ability to engineer new features and capture both business insights and behavior observed in the data to help the model identify the target.
@engelAnalytics @TDataScience πŸ”ΈFeature engineering content is lacking. Google returns many similar pages about:
- handling missing values
- handling outliers
- binning numeric variables
- encoding categorical features
- numerical transformations
- scaling numerical features
- extracting parts of a date
@engelAnalytics @TDataScience πŸ”ΈThe problem with this content
Many blogs discuss how numeric features need to be scaled before modeling. This is not true. Tree-based methods, e.g. XGBoost, LightGBM, etc. are invariant to scaling.
@engelAnalytics @TDataScience While the technique is needed, there are large classes of problems where the technique will not be practical. An example of this is encoding categorical features.
@engelAnalytics @TDataScience πŸ”ΈFeature engineering is more than this.
Its the work recognizing that e.g. ZIP codes can be aggregated to cities, states, DMAs, etc and performing those aggregations. This engineering can be further performed by combining these aggregations with all purchases in that location.
@engelAnalytics @TDataScience πŸ”ΈFeature engineering with time series data
In building models for customer analytics, the individual transaction and the way they change over time are important.
@engelAnalytics @TDataScience When working with data about a single customer with multiple observations over time, one of the most common techniques to aggregate this data through rolling windows followed by an aggregation in pandas.
@engelAnalytics @TDataScience Once these weekly aggregations are created, rate of change of these aggregations can be exceedingly powerful. From a physics point of view, this is the velocity of the underlying feature.
@engelAnalytics @TDataScience πŸ”ΈFeature engineering tutorials on GitHub
rasgoml.com created a GitHub repository devoted to feature engineering tutorials. It contains feature engineering code as Jupyter notebooks.
@engelAnalytics @TDataScience Github repo: github.com
Some of these examples include:
πŸ”Ή Feature Profiling and EDA
πŸ”Ή Data Cleaning
πŸ”Ή Train-Test Splits
πŸ”Ή Feature Importance
πŸ”Ή Feature Selection
@engelAnalytics @TDataScience Full original article I highly recommend to go read: towardsdatascience.com

Loading suggestions...