Tushar Sadhwani
Tushar Sadhwani

@sadhlife

14 Tweets 4 reads Jan 10, 2023
🐍Python Dunder of the day
Day 5: ✨__length_hint__✨
Let's do a little obscure one this time. Did you know that dunders can have multiple words?
Anyway, take a look at this code:
Now look at this code that does the same thing:
You'd expect this second example to be a bit faster because the generator is being iterated over in C code rather than Python code.
✨And you'd be right, this is faster:
But the surprising thing is, it's almost 4 times faster:
Sure, some of that speed is coming from purely C based iteration being faster, but there's another factor here:
🐍A `list` is dynamically sized.
Think about what's actually going on in the background here:
The list obviously doesn't start with a size of 10000, that's too much memory to reserve for an empty list.
But at the end, the list is holding 10000 items, so clearly the actual size of the list is changing in the background.
✨In fact, we can verify it with `sys.getsizeof()`:
`sys.getsizeof(foo)` returns the size of a given object in bytes.
🐍Note that internally, lists don't actually hold objects in them, just pointers to the objects, so a list of 100 ints and 100 strings is the same size.
And you can see, the list starts small and grows bigger:
What exactly happens when the size of the list changes?
The memory allocated for the list can't magically be enlarged, so how is this really working?
In reality,
✨The entire list is copied to a bigger location.
And copying over the list so many times simply wastes CPU time.
Now let's compare the sizes of the lists in the two code snippets:
The first has an arbitrary size, but the second is interesting: It's very close to 80000✨
80000 hmm, is this really relevant? Let's see if the pattern holds:
Indeed! It looks like an empty list has a size of 56 bytes, it seems that it needs some metadata.
But, the list created from a pre-known size seems to be adding 8 bytes for every item in it.
✨And that makes sense: Every pointer on a 64bit machine is 8 bytes! No space wasted!
You know what this means?
`list(range(10000))` Seems to know that the list will be containing exactly 10000 items, and it creates an array space of 80000 bytes for its contents.
How exactly?
✨That's where `__length_hint__` comes in.
`range()` objects expose this dunder:
And not just for the trivial cases, even the more complicated ranges have a proper length hint defined:
If an iterable defines a `__length_hint__` property, `list(iterable)` will allocate that given size from the beginning, that way no time is wasted reallocating the memory as the list is built up.
🐍What other objects do you think define this dunder?
For more information, Check out PEP 424:
peps.python.org

Loading suggestions...