@radek@sigmoid.social (Mastodon) 🇺🇦
@radek@sigmoid.social (Mastodon) 🇺🇦

@radekosmulski

6 Tweets 6 reads Nov 09, 2022
3 ways to speed up your Python/pandas code by up to 10x that I learned from a recent @kaggle notebook:
zip > itertuples
Itertuples is the fastest built-in method to iterate over a pandas DataFrame.
Using zip gives you an additional speed up.
Avoid `apply` like fire.
It might seem `apply` is the only way, but often you will be mistaken.
Check the docs for a long list of vectorized groupby operations: #groupby" target="_blank" rel="noopener" onclick="event.stopPropagation()">pandas.pydata.org
Don't sum Counters.
Update them.
Why does this work?
Creating new objects is expensive.
With millions of operations, these costs add up.
Avoid creating new objects as much as you can!
Why is @kaggle so cool?
A person on the Internet refactored my notebook and that is how I learned (or to be honest, got reminded) of the three techniques above! 🔥
What a great environment to learn!
You can find the refactored notebook here: kaggle.com

Loading suggestions...