Rattibha

📚 Feature Engineering in Python 📚
Original article by @engelAnalytics from @TDataScience
#datascience #machinelearning
🧵 A quick summary 👇

@engelAnalytics @TDataScience 🔸Most of the value as a data scientist comes from the ability to engineer new features and capture both business insights and behavior observed in the data to help the model identify the target.

@engelAnalytics @TDataScience 🔸Feature engineering content is lacking. Google returns many similar pages about:
- handling missing values
- handling outliers
- binning numeric variables
- encoding categorical features
- numerical transformations
- scaling numerical features
- extracting parts of a date

@engelAnalytics @TDataScience 🔸The problem with this content
Many blogs discuss how numeric features need to be scaled before modeling. This is not true. Tree-based methods, e.g. XGBoost, LightGBM, etc. are invariant to scaling.

@engelAnalytics @TDataScience While the technique is needed, there are large classes of problems where the technique will not be practical. An example of this is encoding categorical features.

@engelAnalytics @TDataScience 🔸Feature engineering is more than this.
Its the work recognizing that e.g. ZIP codes can be aggregated to cities, states, DMAs, etc and performing those aggregations. This engineering can be further performed by combining these aggregations with all purchases in that location.

@engelAnalytics @TDataScience 🔸Feature engineering with time series data
In building models for customer analytics, the individual transaction and the way they change over time are important.

@engelAnalytics @TDataScience When working with data about a single customer with multiple observations over time, one of the most common techniques to aggregate this data through rolling windows followed by an aggregation in pandas.

@engelAnalytics @TDataScience Once these weekly aggregations are created, rate of change of these aggregations can be exceedingly powerful. From a physics point of view, this is the velocity of the underlying feature.

@engelAnalytics @TDataScience 🔸Feature engineering tutorials on GitHub
rasgoml.com created a GitHub repository devoted to feature engineering tutorials. It contains feature engineering code as Jupyter notebooks.

www.rasgoml.com

@engelAnalytics @TDataScience Github repo: github.com
Some of these examples include:
🔹 Feature Profiling and EDA
🔹 Data Cleaning
🔹 Train-Test Splits
🔹 Feature Importance
🔹 Feature Selection

github.com

@engelAnalytics @TDataScience Full original article I highly recommend to go read: towardsdatascience.com

towardsdatascience.com

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread