Reuven M. Lerner
Reuven M. Lerner

@reuvenmlerner

3 Tweets 1 reads Dec 29, 2022
What's the best way to count rows in your #Python 🐍 Pandas 🐼 data frame?
1: df.count()
This has three problems: (a) It returns a value for each column, (b) it leaves out NaN, and (c) it's the slowest technique. But if you need a per-column count that ignores NaNs, go for it.
Fortunately, we have (at least) two other options:
2. df.shape[0]
This works just fine, but it's ugly
3. len(df)
This works just fine, and in my testing, runs about 2x as fast as df.shape[0].
Which do you use?
I'll add that I got a comment on LinkedIn from someone who suggested len(df.index).
I did some experimenting, and this turns out to be even faster than len(df)!
I thought it might only be the case with a RangeIndex, but even with an index of random strings, it was faster.

Loading suggestions...