Jean de Nyandwi
Jean de Nyandwi

@Jeande_d

16 Tweets 5 reads Jul 16, 2021
Machine Learning is complicated. And sometimes, it's not our fault. Not really!
Let's talk about it..
Let's say that you did a great job at finding good data, you prepared it reasonably well, and your model made pretty great predictions. Everything is cool at the moment!
👇
But there are times we won't be able to prevent the worse to happen.
A model that used to make good predictions can start to make misleading predictions.
Why is that?
There are two reasons why that can happen, and they are rooted in changes, either that is the change in data, model or both.
Data drift and model drift...but let's talk about data drift first...
Let's say that you trained an image classifier with the images collected from the internet. The model did well on these images.
But later you deployed the classifier on the mobile device to recognize images taken from the mobile camera.
If such a mobile cam is not decent enough to take good images just like the high-resolution internet images that the model was trained on, it's very likely that the model will fail.
The data changed. This is data drift.
In data drift, the distribution of the data has changed and that resulted in the failure of the model.
But there are times data will not change, but the model still performs worse.
How does that happen?
Let's take another example but in this case, I will use Twitter as an example.
The data that is used to recommend to me the favourite content do not change.
But if I pass a week or more without using Twitter in a mobile app, I will find news on top of the feeds (which I don't check often),
but at the same time, the same account, if I use the web, I mostly get machine learning content on the top because I use the Twitter web often.
What happened?
The data that Twitter uses to recommend me good tweets didn't change, but my behaviour towards the platforms changed.
You will see this too if you pass a week or more without using a given service, the model that recommends things to you will decay because you changed your behaviour towards these services.
If you watch videos on Youtube/Netflix, you can notice this too, especially in the times you are using these services more frequently or less than normal.
They will fail to learn your new behaviour, they will fail to recommend your movie types.
How to deal with these two drifts?
One of the ways to handle them is to retrain the model with both old and new data.
And sometimes, scheduling retraining can be a good way to deal with that. How long you should wait to retrain a model depend on the type of your problem.
For things that are used frequently (and winner take all) like online stores, streaming services, retraining should not wait too long.
You do not want users to be overwhelmed with your recommendations.
Other times retraining can be weekly, monthly, or whatever that time can suit the type of things you dealing with.
To summarize this thread
Machine learning is complicated. Data can change, models can change, and any of those affect the predictions.
One of the ways to deal with these changes is to retrain the models with the old & new data and if retraining is frequent, it can be scheduled
This is the end of the thread. Thank you for reading!
I am actively writing about machine learning techniques, concepts and ideas.
You can support me by following @Jeande_d, liking this thread, or sharing it with your followers.
More to come 🙌🏻

Loading suggestions...