π Feature Engineering in Python π
Original article by @engelAnalytics from @TDataScience
#datascience #machinelearning
π§΅ A quick summary π
Original article by @engelAnalytics from @TDataScience
#datascience #machinelearning
π§΅ A quick summary π
@engelAnalytics @TDataScience πΈMost of the value as a data scientist comes from the ability to engineer new features and capture both business insights and behavior observed in the data to help the model identify the target.
@engelAnalytics @TDataScience πΈFeature engineering content is lacking. Google returns many similar pages about:
- handling missing values
- handling outliers
- binning numeric variables
- encoding categorical features
- numerical transformations
- scaling numerical features
- extracting parts of a date
- handling missing values
- handling outliers
- binning numeric variables
- encoding categorical features
- numerical transformations
- scaling numerical features
- extracting parts of a date
@engelAnalytics @TDataScience πΈThe problem with this content
Many blogs discuss how numeric features need to be scaled before modeling. This is not true. Tree-based methods, e.g. XGBoost, LightGBM, etc. are invariant to scaling.
Many blogs discuss how numeric features need to be scaled before modeling. This is not true. Tree-based methods, e.g. XGBoost, LightGBM, etc. are invariant to scaling.
@engelAnalytics @TDataScience While the technique is needed, there are large classes of problems where the technique will not be practical. An example of this is encoding categorical features.
@engelAnalytics @TDataScience πΈFeature engineering is more than this.
Its the work recognizing that e.g. ZIP codes can be aggregated to cities, states, DMAs, etc and performing those aggregations. This engineering can be further performed by combining these aggregations with all purchases in that location.
Its the work recognizing that e.g. ZIP codes can be aggregated to cities, states, DMAs, etc and performing those aggregations. This engineering can be further performed by combining these aggregations with all purchases in that location.
@engelAnalytics @TDataScience πΈFeature engineering with time series data
In building models for customer analytics, the individual transaction and the way they change over time are important.
In building models for customer analytics, the individual transaction and the way they change over time are important.
@engelAnalytics @TDataScience When working with data about a single customer with multiple observations over time, one of the most common techniques to aggregate this data through rolling windows followed by an aggregation in pandas.
@engelAnalytics @TDataScience Once these weekly aggregations are created, rate of change of these aggregations can be exceedingly powerful. From a physics point of view, this is the velocity of the underlying feature.
@engelAnalytics @TDataScience πΈFeature engineering tutorials on GitHub
rasgoml.com created a GitHub repository devoted to feature engineering tutorials. It contains feature engineering code as Jupyter notebooks.
rasgoml.com created a GitHub repository devoted to feature engineering tutorials. It contains feature engineering code as Jupyter notebooks.
@engelAnalytics @TDataScience Github repo: github.com
Some of these examples include:
πΉ Feature Profiling and EDA
πΉ Data Cleaning
πΉ Train-Test Splits
πΉ Feature Importance
πΉ Feature Selection
Some of these examples include:
πΉ Feature Profiling and EDA
πΉ Data Cleaning
πΉ Train-Test Splits
πΉ Feature Importance
πΉ Feature Selection
@engelAnalytics @TDataScience Full original article I highly recommend to go read: towardsdatascience.com
Loading suggestions...