Tired of training lots of Machine Learning models, and not getting better results? 😵💫
This is how you solve this 🧠↓
This is how you solve this 🧠↓
If your ML model does not work, you have at least 1 of these 2 problems:
1 → The model is too simple to capture the patterns in the training data, and you need a more powerful model (step 3).
2 → The Training data has no patterns, so no model will work (steps 1 and 2).
1 → The model is too simple to capture the patterns in the training data, and you need a more powerful model (step 3).
2 → The Training data has no patterns, so no model will work (steps 1 and 2).
Problem #1: "How do I know if I need a more complex model?"
If a tabular dataset is solvable (aka there are patterns between the features and the target), a boosting tree (e.g XGBoost) will find them.
If an XGBoost model does not work, the problem is not the model, but the data
If a tabular dataset is solvable (aka there are patterns between the features and the target), a boosting tree (e.g XGBoost) will find them.
If an XGBoost model does not work, the problem is not the model, but the data
Problem #2: "Why there are no patterns in the training data?"
2 possible reasons:
1 → The problem is intrinsically very hard because the target is almost random (e.g. predict crypto prices). Not much you can do here.
2 → You missed predictive features. This is solvable 😉
2 possible reasons:
1 → The problem is intrinsically very hard because the target is almost random (e.g. predict crypto prices). Not much you can do here.
2 → You missed predictive features. This is solvable 😉
How? 🤔
When you generate your training data, you typically write a long SQL query against an enterprise database, that
- fetches,
- aggregates,
- and merges data
from many tables.
Enterprise databases are large collections of tables...
When you generate your training data, you typically write a long SQL query against an enterprise database, that
- fetches,
- aggregates,
- and merges data
from many tables.
Enterprise databases are large collections of tables...
... and chances are, you are missing some important tables in your SQL query.
Ask around the team,
🧑🔬: "What features do you think are important for this problem?".
You often hear things you did not expect, which turn out to be pure gold for your ML project.
Ask around the team,
🧑🔬: "What features do you think are important for this problem?".
You often hear things you did not expect, which turn out to be pure gold for your ML project.
If these features are not yet in the database, talk to data engineers to see how they can be added.
Add them to the training data, and your models will start to work.
Voilà!
Add them to the training data, and your models will start to work.
Voilà!
Wanna get more real-world Machine Learning tips and tricks?
Join my e-mail list and get precious advice right in your inbox
datamachines.xyz
Join my e-mail list and get precious advice right in your inbox
datamachines.xyz
Wanna become a professional Machine Learning engineer?
→ Follow me @paulabartabajo_
Wanna help?
Like/Retweet the first tweet below to spread the wisdom 🙏
→ Follow me @paulabartabajo_
Wanna help?
Like/Retweet the first tweet below to spread the wisdom 🙏
Loading suggestions...