Pau Labarta Bajo
Pau Labarta Bajo

@paulabartabajo_

11 Tweets 2 reads Dec 07, 2022
Looking for effective ways to learn MLOps?
Forget theory and get your hands on a real-world problem 🧠
Here is a project you can build (for free) using Python 👩🏽‍💻👨‍💻↓↓↓
Let's build an ML service to predict the price of Ethereum (ETH) in the next 1 hour, using Python 🐍 and serverless tools.
You will learn a lot, AND you might even make some money 💰
These are the steps to build this system ↓
Step 1: Feature generation script 🐍
1 → fetches raw data on actual trades ETH/USD from the Kraken API: docs.kraken.com
2 → engineers new features from the raw data (aka model inputs), and targets (aka model outputs)
3 → stores these features in the *Feature Store*
Attention 📢:
Feature engineering is CRITICAL for asset price prediction.
An open-source library like FinTa helps you quickly engineer useful trading features from your raw trading data.
github.com
Step 2: Backfill historical (features, targets) ⏮️
To train a Machine Learning model later, you need enough historical data (features, targets) in your Feature Store.
Run the feature script for a range of past dates, to get enough training data.
Step 4: Model training script 🏋️
1 → fetches historical (features, targets) from the Feature Store.
2 → trains and evaluate the best ML model possible for this data, e.g. XGBoostRegressor.
3 → stores the trained model in the Model Registry.
Step 5: Automate execution of the feature script 🕰️
Create a GitHub action to automatically run the feature script (from step 1) every hour.
GitHub actions are serverless computing power to run your code on a schedule. For free.
Boom.
Step 6: Create a web app to show model predictions 👨🏽‍💻
Streamlit is a great Python library to develop and deploy web data apps.
Your app
1 → loads the model and features from the *Feature Store*,
2 → computes the predicted ETH/USD price and shows it on a beautiful UI.
BOOM!
Bonus 🎁
You can also create another GitHub action to re-train the ML model.
Why re-train the model? 🤔
Because ML model performance decreases over time.
The best way to mitigate this is to regularly re-train the model, like once a week.
Wanna build this system?
I am preparing a hands-on tutorial (including videos, code, and slides) to help you build a system like this.
Join my e-mail list to be notified when the tutorial is out ↓
datamachines.xyz
Every week I share real-world Data Science/Machine Learning content.
Follow me @paulabartabajo_ so you do not miss what's coming next.
Wanna help?
Like/Retweet the first tweet below to spread the wisdom ↓↓↓

Loading suggestions...