Grokking Builtin.all in Python
A coworker shared this neat code snippet on slack:
>>> all([0, 0, 0])
False
>>> all([1, 1, 1])
True
and mused to us all:
this is annoying
– venerated colleague, circa 2021
Indeed he’s right! It is annoying. His note piqued my curiousity as to why/how this worked under the hood; in this post I take a dive into the guts of cpython searching for answers.
(EDIT: heads up, I wrote a lot of words here. If you want the simpler walk through without the fancy code spelunking, click here to skip to the final section)
First, a sensible alternative
Let’s get a few things out of the way before we go on.
(NOTE: Neither my colleague nor I actually knew about the formal definition of all
- so we expected all([0,0,0])
to indeed return True
and particularly, I was under the impression that this might actually be a bug in python’s implementation of all(..)
)
First and foremost - let’s explain why, naively, all([0,0,0])
returning False
(given our misconception of expected behavior) is in fact not entirely unexpected.
In python, there is a concept of truthy and falsy. While there exists explicitly defined True
and False
values (of type bool as it were), non boolean types are also evaluated by python to True or False within boolean contexts.
A “boolean context”, fwiw, looks like this:
some_value = "not a boolean, lol"
if some_value:
pass
# or
while some_value:
pass
In the example above, some_value
is clearly a str
yet when used in an if (or while!) statement block, it is evaluated as True
!
In general, all values in python are “truthy” unless they aren’t, in which case, they are “falsy”. If you can imagine, there are only a few explicit values that are defined as “falsy”, find them enumerated here:
[] # empty list
{} # empty dict
() # empty tuple
set() # empty set
"" # empty string
range(0) # empty range
0 # int, 0
0.0 # float, 0
None # nonetype, obvi
False # boolean false
(I had the hardest time finding a definitive list of these items on the interwebs. The closest I could find that seemed “official” was this py2.4(!!) “Python Library Reference”. The list above is from this freecodecamp article)
Ok cool - so with this clarification, it is obvious why all([0,0,0])
evals to False
. 0
is falsy therefore it is at least plausible to surmise that there is a naive if statement somewhere that fails to account for this edge case.
Moreover, it stands to reason that all of the following will also probably eval to False
:
all([False, False, False])
all([{}, {}, {})
# .. etc, you get the idea
Ok great - having established this, let’s take to the official python3 (at time of this writing) docs:
all(iterable)
Return True if all elements of the iterable are true (or if the iterable is empty).
The docs also present this “equivalent” implementation:
def all(iterable):
for element in iterable:
if not element:
return False
return True
(Heads up, in part 2 of this post here, we actually formalize this logic and tack it on to python’s source!)
From the documentation, two things are obvious to me:
1: Using all([0,0,0])
to determine if all members of an iterable are equal is kind of a bad usecase. The purpose of the all
method is only to determine if all the values in an iterable are truthy. That’s it. Full stop. (But also, like, TIL)
2: If you do want to determine whether all the values in an iterable are indeed the same value, consider:
len(set([0,0,0])) == 1
(Note: this will not work for something like [{}, {}, {}]
since dicts are not hashable. However, for basic usecases where we want to (safely!) determine if a list of primitive types are all the same, the approach is IMO “good enough”)
(Note 2: if you really want to handle all cases, including unhashable types, consider the following:
inp = [{},{},{}]
samesame = True
for it in inp:
samesame = inp[0] == it
if not samesame:
break
# samesame will be False is all items in inp are not the same val
print(samesame) # True
… certainly not as compact but gets the job done!)
Venturing into the guts of CPython
Ok, now for the meat of this story.
Let me preface this by saying: I really enjoy venturing into source code to validate behavior that I am seeing. More often than not, this is fueled by (perhaps an unreasonable…) hope that maybe I’ll notice something undocumented or interesting that could be helpful immediately or sometime in the future.
For this specific case though, I was wondering how easy/hard it may be to submit a PR to rectify this behavior of all
(spoiler: not all that straight forward and more importantly, definitely not appropriate given the officially documented behavior we saw above)
Still, this was an interesting exercise and worth documenting for fun and profit (…mainly fun, though)
I started by searching for all
within the cpython
repo. Generally Github’s code searching functionality is pretty great - the main issue here was that all
is fairly generic word that is used often both in and outside of the code (like in documentation and such).
This stackoverflow suggested that builtin funcs could be found in the Object
folder:
However, many of the built-in types can be found in the Objects sub-directory of the Python source trunk. For example, see here for the implementation of the enumerate class or here for the implementation of the list type.
Once I started digging into the src code, I realized that there is a lot of code (surprise, surprise) in the cpython codebase. So I cloned the entire project to my local machine and started grepping the heck out of…everything in the project folder.
➜ cpython git:(master) grep -r "any(" .
Running the ^ above (I figured any
, a similar method to all
might be a less frequently used token than all
…) I ended up with a ton of results. I manually scanned through each line item in the output (anyone have a better way to search for things like this? I couldn’t think of anything useful but admittedly I didn’t try too hard) I ended up here:
./Python/bltinmodule.c:builtin_any(PyObject *module, PyObject *iterable)
That looked interesting! Looking at the func definition (ok, one definition above builtin_any
- as I mentioned, I looked for any
but I wanted the src for all
):
319/*[clinic input]
320all as builtin_all
321 iterable: object
322 /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331 PyObject *it, *item;
332 PyObject *(*iternext)(PyObject *);
333 int cmp;
334
335 it = PyObject_GetIter(iterable);
336 if (it == NULL)
337 return NULL;
338 iternext = *Py_TYPE(it)->tp_iternext;
339
340 for (;;) {
341 item = iternext(it);
342 if (item == NULL)
343 break;
344 cmp = PyObject_IsTrue(item);
345 Py_DECREF(item);
346 if (cmp < 0) {
347 Py_DECREF(it);
348 return NULL;
349 }
350 if (cmp == 0) {
351 Py_DECREF(it);
352 Py_RETURN_FALSE;
353 }
354 }
355 Py_DECREF(it);
356 if (PyErr_Occurred()) {
357 if (PyErr_ExceptionMatches(PyExc_StopIteration))
358 PyErr_Clear();
359 else
360 return NULL;
361 }
362 Py_RETURN_TRUE;
363}
(Sauce)
We found it! This is the actual implementation of the builtin all
method!!
Ok cool, so let’s examine it. The most interesting to us are the lines from 340 onwards:
340 for (;;) {
341 item = iternext(it);
342 if (item == NULL)
343 break;
344 cmp = PyObject_IsTrue(item);
345 Py_DECREF(item);
346 if (cmp < 0) {
347 Py_DECREF(it);
348 return NULL;
349 }
350 if (cmp == 0) {
351 Py_DECREF(it);
352 Py_RETURN_FALSE;
353 }
354 }
355 Py_DECREF(it);
356 if (PyErr_Occurred()) {
357 if (PyErr_ExceptionMatches(PyExc_StopIteration))
358 PyErr_Clear();
359 else
360 return NULL;
361 }
362 Py_RETURN_TRUE;
This is the main loop that walks through each item in the iterable and applies a bunch of logic to validate if it is true
or not. To understand what true
means, we consider line 344, which invokes the PyObject_IsTrue
method. Using a similar approach to what I described earlier, I was able to track down the definition of the PyObject_IsTrue
method, reproduced below:
1379/* Test a value used as condition, e.g., in a while or if statement.
1380 Return -1 if an error occurred */
1381
1382int
1383PyObject_IsTrue(PyObject *v)
1384{
1385 Py_ssize_t res;
1386 if (v == Py_True)
1387 return 1;
1388 if (v == Py_False)
1389 return 0;
1390 if (v == Py_None)
1391 return 0;
1392 else if (Py_TYPE(v)->tp_as_number != NULL &&
1393 Py_TYPE(v)->tp_as_number->nb_bool != NULL)
1394 res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1395 else if (Py_TYPE(v)->tp_as_mapping != NULL &&
1396 Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
1397 res = (*Py_TYPE(v)->tp_as_mapping->mp_length)(v);
1398 else if (Py_TYPE(v)->tp_as_sequence != NULL &&
1399 Py_TYPE(v)->tp_as_sequence->sq_length != NULL)
1400 res = (*Py_TYPE(v)->tp_as_sequence->sq_length)(v);
1401 else
1402 return 1;
1403 /* if it is negative, it should be either -1 or -2 */
1404 return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
1405}
(Sauce)
So this method, as it turns out, isn’t all that complex. It does a bunch of checks against v
a generic type called PyObject
that, recall from the previous builtin_all
method, is item
– a member of our iterable passed in to all
.
Let’s examine this PyObject_IsTrue
method line by line:
First, we inspect this (line 1386):
// ...
if (v == Py_True)
return 1;
Recall that v
for us is just 0
. So this condition does not match; 0
is not True
. Onwards!
The next two conditions (lines 1388-1391):
// ...
if (v == Py_False)
return 0;
if (v == Py_None)
return 0;
also do not match. v
is 0
not False
nor None
.
The next check is key (lines 1392-1394, for this exercise, at least).
// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
Py_TYPE(v)->tp_as_number->nb_bool != NULL)
res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
Woof. I’m not super familiar with C so this code block looks especially foreign to me.
I tried figuring out what the definition of Py_TYPE
might be – but unfortunately didn’t get too far in my grepping adventures. So, I turned again to the python docs (…by googling, of course).
The Type Objects documentation had a lot of good info that helped clear up exactly what this particular line means:
Py_TYPE(v)->tp_as_number
Based on my interpretation of the documentation + the preamble from this helpful blog post, all types in python are instances of the C struct _typeobject
which determines how a type behaves. tp_as_number
is a member (bascially an element in the struct) of the _typeobject
struct. Only types that are numerical or support numerical usage (ie: floats or bools since they can be used as 0/1, etc) have this member defined and NOT set to NULL.
To further clarify this, let’s take another dive into the code. We begin by hunting down the definition of the _typeobject
, which is graciously reproduced for our benefit in the Type Objects docs I mentioned above. I am re-producting it here for convenience but abridging all but the relevant members:
1typedef struct _typeobject {
2 // ... !!! removing since not needed for our analysis
3
4 /* Method suites for standard classes */
5
6 PyNumberMethods *tp_as_number;
7
8 // ... !!! removing since not needed for our analysis
9
10} PyTypeObject;
(Sauce)
The main hint here for me was PyTypeObject
; by grepping the codebase for this token, I was able to locate the following PyTypeObject
:
2778PyTypeObject PyZip_Type = {
2779 PyVarObject_HEAD_INIT(&PyType_Type, 0)
2780 "zip", /* tp_name */
2781 sizeof(zipobject), /* tp_basicsize */
2782 0, /* tp_itemsize */
2783 /* methods */
2784 (destructor)zip_dealloc, /* tp_dealloc */
2785 0, /* tp_vectorcall_offset */
2786 0, /* tp_getattr */
2787 0, /* tp_setattr */
2788 0, /* tp_as_async */
2789 0, /* tp_repr */
2790 0, /* tp_as_number */
2791 0, /* tp_as_sequence */
2792 0, /* tp_as_mapping */
2793 0, /* tp_hash */
2794 0, /* tp_call */
2795 0, /* tp_str */
2796 PyObject_GenericGetAttr, /* tp_getattro */
2797 0, /* tp_setattro */
2798 0, /* tp_as_buffer */
2799 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
2800 Py_TPFLAGS_BASETYPE, /* tp_flags */
2801 zip_doc, /* tp_doc */
2802 (traverseproc)zip_traverse, /* tp_traverse */
2803 0, /* tp_clear */
2804 0, /* tp_richcompare */
2805 0, /* tp_weaklistoffset */
2806 PyObject_SelfIter, /* tp_iter */
2807 (iternextfunc)zip_next, /* tp_iternext */
2808 zip_methods, /* tp_methods */
2809 0, /* tp_members */
2810 0, /* tp_getset */
2811 0, /* tp_base */
2812 0, /* tp_dict */
2813 0, /* tp_descr_get */
2814 0, /* tp_descr_set */
2815 0, /* tp_dictoffset */
2816 0, /* tp_init */
2817 PyType_GenericAlloc, /* tp_alloc */
2818 zip_new, /* tp_new */
2819 PyObject_GC_Del, /* tp_free */
2820};
(Sauce)
Line 2790 is the line we care about: note how the value is set to 0
here. But what is PyZip_Type
anyways? In that same file (but further down), we observe:
2917#define SETBUILTIN(NAME, OBJECT) \
2918 if (PyDict_SetItemString(dict, NAME, (PyObject *)OBJECT) < 0) \
2919 return NULL; \
2920 ADD_TO_ALL(OBJECT)
2921
2922 SETBUILTIN("None", Py_None);
2923 SETBUILTIN("Ellipsis", Py_Ellipsis);
2924 SETBUILTIN("NotImplemented", Py_NotImplemented);
2925 SETBUILTIN("False", Py_False);
2926 SETBUILTIN("True", Py_True);
2927 SETBUILTIN("bool", &PyBool_Type);
2928 SETBUILTIN("memoryview", &PyMemoryView_Type);
2929 SETBUILTIN("bytearray", &PyByteArray_Type);
2930 SETBUILTIN("bytes", &PyBytes_Type);
2931 SETBUILTIN("classmethod", &PyClassMethod_Type);
2932 SETBUILTIN("complex", &PyComplex_Type);
2933 SETBUILTIN("dict", &PyDict_Type);
2934 SETBUILTIN("enumerate", &PyEnum_Type);
2935 SETBUILTIN("filter", &PyFilter_Type);
2936 SETBUILTIN("float", &PyFloat_Type);
2937 SETBUILTIN("frozenset", &PyFrozenSet_Type);
2938 SETBUILTIN("property", &PyProperty_Type);
2939 SETBUILTIN("int", &PyLong_Type);
2940 SETBUILTIN("list", &PyList_Type);
2941 SETBUILTIN("map", &PyMap_Type);
2942 SETBUILTIN("object", &PyBaseObject_Type);
2943 SETBUILTIN("range", &PyRange_Type);
2944 SETBUILTIN("reversed", &PyReversed_Type);
2945 SETBUILTIN("set", &PySet_Type);
2946 SETBUILTIN("slice", &PySlice_Type);
2947 SETBUILTIN("staticmethod", &PyStaticMethod_Type);
2948 SETBUILTIN("str", &PyUnicode_Type);
2949 SETBUILTIN("super", &PySuper_Type);
2950 SETBUILTIN("tuple", &PyTuple_Type);
2951 SETBUILTIN("type", &PyType_Type);
2952 SETBUILTIN("zip", &PyZip_Type);
2953 debug = PyBool_FromLong(config->optimization_level == 0);
2954 if (PyDict_SetItemString(dict, "__debug__", debug) < 0) {
2955 Py_DECREF(debug);
2956 return NULL;
2957 }
2958 Py_DECREF(debug);
2959
2960 return mod;
2961#undef ADD_TO_ALL
2962#undef SETBUILTIN
(Sauce)
This provides strong evidence that the python builtin definition for zip()
(line 2952, above) is actually specified by PyZip_Type
. As we know, we cannot use zip()
in any numerical contexts which explains why tp_as_number
is set to 0
(in C, null
is a constant with value of 0 (sauce)).
But let’s be sure…consider (randomly picked) the float
type – from line 2936:
SETBUILTIN("float", &PyFloat_Type);
If we were to peer into PyFloat_Type
, we would expect to see tp_as_number
defined - not as 0
- but as a PyNumberMethods
struct.
We locate PyFloat_Type
and upon inspection:
1919PyTypeObject PyFloat_Type = {
1920 PyVarObject_HEAD_INIT(&PyType_Type, 0)
1921 "float",
1922 sizeof(PyFloatObject),
1923 0,
1924 (destructor)float_dealloc, /* tp_dealloc */
1925 0, /* tp_vectorcall_offset */
1926 0, /* tp_getattr */
1927 0, /* tp_setattr */
1928 0, /* tp_as_async */
1929 (reprfunc)float_repr, /* tp_repr */
1930 &float_as_number, /* tp_as_number */
1931 0, /* tp_as_sequence */
1932 0, /* tp_as_mapping */
1933 (hashfunc)float_hash, /* tp_hash */
1934 0, /* tp_call */
1935 0, /* tp_str */
1936 PyObject_GenericGetAttr, /* tp_getattro */
1937 0, /* tp_setattro */
1938 0, /* tp_as_buffer */
1939 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
1940 float_new__doc__, /* tp_doc */
1941 0, /* tp_traverse */
1942 0, /* tp_clear */
1943 float_richcompare, /* tp_richcompare */
1944 0, /* tp_weaklistoffset */
1945 0, /* tp_iter */
1946 0, /* tp_iternext */
1947 float_methods, /* tp_methods */
1948 0, /* tp_members */
1949 float_getset, /* tp_getset */
1950 0, /* tp_base */
1951 0, /* tp_dict */
1952 0, /* tp_descr_get */
1953 0, /* tp_descr_set */
1954 0, /* tp_dictoffset */
1955 0, /* tp_init */
1956 0, /* tp_alloc */
1957 float_new, /* tp_new */
1958 .tp_vectorcall = (vectorcallfunc)float_vectorcall,
1959};
(Sauce)
Line number 1930, happily, proves us right! tp_as_number
is a pointer to float_as_number
, which is defined right above:
1883static PyNumberMethods float_as_number = {
1884 float_add, /* nb_add */
1885 float_sub, /* nb_subtract */
1886 float_mul, /* nb_multiply */
1887 float_rem, /* nb_remainder */
1888 float_divmod, /* nb_divmod */
1889 float_pow, /* nb_power */
1890 (unaryfunc)float_neg, /* nb_negative */
1891 float_float, /* nb_positive */
1892 (unaryfunc)float_abs, /* nb_absolute */
1893 (inquiry)float_bool, /* nb_bool */
1894 0, /* nb_invert */
1895 0, /* nb_lshift */
1896 0, /* nb_rshift */
1897 0, /* nb_and */
1898 0, /* nb_xor */
1899 0, /* nb_or */
1900 float___trunc___impl, /* nb_int */
1901 0, /* nb_reserved */
1902 float_float, /* nb_float */
1903 0, /* nb_inplace_add */
1904 0, /* nb_inplace_subtract */
1905 0, /* nb_inplace_multiply */
1906 0, /* nb_inplace_remainder */
1907 0, /* nb_inplace_power */
1908 0, /* nb_inplace_lshift */
1909 0, /* nb_inplace_rshift */
1910 0, /* nb_inplace_and */
1911 0, /* nb_inplace_xor */
1912 0, /* nb_inplace_or */
1913 float_floor_div, /* nb_floor_divide */
1914 float_div, /* nb_true_divide */
1915 0, /* nb_inplace_floor_divide */
1916 0, /* nb_inplace_true_divide */
1917};
(Sauce)
Tada! Our assumptions are correct and indeed the check:
// ...
else if (Py_TYPE(v)->tp_as_number != NULL && // ...
is mainly checking to ensure that v
, our item
in question, has numerical properties as defined by the PyNumberMethods
struct. If so, we then go on to check and ensure that:
Py_TYPE(v)->tp_as_number->nb_bool != NULL
which leads us to our second big question: what the hell is nb_bool
??
To answer this question - thankfully - we don’t need to look all that far. But before looking into the code, let’s revisit out trusty docs one last time. The “Quick Reference” section (here) displays a bunch of “slots” that are defined in the _typeobject
struct:
Clicking on the sub-slots link under the special methods/attrs column for tp_as_number
leads us to:
Generally, these “subslots” display a variety of C struct elements and the corresponding python “special method” that it relates to. In particular and most useful to note is nb_bool
, which relates to the python __bool__
dunder method! So basically,
Py_TYPE(v)->tp_as_number->nb_bool != NULL
is simply checking to ensure that the Py_TYPE
has a valid __bool__
method defined!
With thiat, let’s now look at line 1893 from our static PyNumberMethods float_as_number
definition, which illustrates that for PyFloat_Type
at least, the nb_bool
slot is defined as float_bool
:
// ...
(inquiry)float_bool, /* nb_bool */
and the definition of float_bool
is available in the same floatobject.c file that defines PyFloat_Type
:
static int
float_bool(PyFloatObject *v)
{
return v->ob_fval != 0.0;
}
(Sauce)
On initial glance, the utility of this method is obvious - return True
if ob_fval
is not 0.0
. Otherwise, return False
.
But now on to a new question: what the heck is v->ob_fval
?? If we were to guess, it is probably the actual numerical value of our variable v
of type PyFloatObject
(like for instance the number 3.14159
that is stored as type float).
Confirming this is easy - just look at the header file for floatobject.c:
15typedef struct {
16 PyObject_HEAD
17 double ob_fval;
18} PyFloatObject;
This clearly demonstrates that PyFloatObject
has a property called ob_fval
which is of type double
. In float_bool(PyFloatObject *v)
, we compare this value against 0.0
to return a boolean that determines the “truthy” or “falsy” –ness of our value.
Moreover, it is clear that for other Py_TYPE
s out there (like ints or bools for instance), we can repeat this exercise and end up again at some function that represents nb_bool
(corresponding to the __bool__
dunder method) which defines the logic that determines if the underlying value is “truthy” or “falsy”. For example, for python’s range
builtin (defined as PyRange_Type
), the nb_bool
method is defined as:
680static int
681range_bool(rangeobject* self)
682{
683 return PyObject_IsTrue(self->length);
684}
In other words, it generally works the same way as float_bool
does - a static method that returns an int
is associated with the nb_bool
slot but of course implementation details are determined by and specific to the characteristics of the type in question (PyRange_Type
, PyFloat_Type
, etc)
And SO, finally, we can fully parse our initial code block and explain what it is doing (in the context of all([0,0,0])
):
// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
Py_TYPE(v)->tp_as_number->nb_bool != NULL)
res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1: check to see if v
has numerical methods defined
2: also ensure that a proper __bool__
method is available for v
FINALLY: if (1) and (2) are both NOT NULL, then call the __bool__
method on v
, which will set res
to be 1
, 0
or some numeric representation of error (in other words, True
or False
).
At this point, it becomes painfully obvious as to why we get the False
return value that we have observed for our initial example of all([0,0,0])
.
(Note: I sneakily did not include the actual nb_bool
implementation for the python int
type, which is defined in the PyLong_Type
struct. This is mainly because for ints at least, the definition/code path is a little less obvious (but works the same way in principle) so for the purposes of clarity I chose to go with PyFloat_Type
which IMO is easier to follow)
Tying it all together, finally!
Still, for the sake of completeness, let’s see this thought process through to completion. I’ll walk through the code path and expound on what (I am fairly sure at this point) is happening at each step.
all([0,0,0])
# ??? -- we want to replace this "???" with True or False
The builtin method all
is defined here:
#define BUILTIN_ALL_METHODDEF \
{"all", (PyCFunction)builtin_all, METH_O, builtin_all__doc__},
(Sauce)
this is how the method builtin_all
in associated the the python func all()
. Next, we end up in the function definition for builtin_all
, reproducing here for convenience:
319/*[clinic input]
320all as builtin_all
321 iterable: object
322 /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331 PyObject *it, *item;
332 PyObject *(*iternext)(PyObject *);
333 int cmp;
334
335 it = PyObject_GetIter(iterable);
336 if (it == NULL)
337 return NULL;
338 iternext = *Py_TYPE(it)->tp_iternext;
339
340 for (;;) {
341 item = iternext(it);
342 if (item == NULL)
343 break;
344 cmp = PyObject_IsTrue(item);
345 Py_DECREF(item);
346 if (cmp < 0) {
347 Py_DECREF(it);
348 return NULL;
349 }
350 if (cmp == 0) {
351 Py_DECREF(it);
352 Py_RETURN_FALSE;
353 }
354 }
355 Py_DECREF(it);
356 if (PyErr_Occurred()) {
357 if (PyErr_ExceptionMatches(PyExc_StopIteration))
358 PyErr_Clear();
359 else
360 return NULL;
361 }
362 Py_RETURN_TRUE;
363}
(Sauce)
On line 341, we produce our first item
– 0
since our input into the original all()
function was [0,0,0]
.
On line 344, we invoke PyObject_IsTrue
for 0
, which is a PyLongObject
(int, basically).
Ok, so let’s reproduce PyObject_IsTrue
below (for convenience) and trace our item
as it traverses the control flow structure:
1379/* Test a value used as condition, e.g., in a while or if statement.
1380 Return -1 if an error occurred */
1381
1382int
1383PyObject_IsTrue(PyObject *v)
1384{
1385 Py_ssize_t res;
1386 if (v == Py_True)
1387 return 1;
1388 if (v == Py_False)
1389 return 0;
1390 if (v == Py_None)
1391 return 0;
1392 else if (Py_TYPE(v)->tp_as_number != NULL &&
1393 Py_TYPE(v)->tp_as_number->nb_bool != NULL)
1394 res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1395 else if (Py_TYPE(v)->tp_as_mapping != NULL &&
1396 Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
1397 res = (*Py_TYPE(v)->tp_as_mapping->mp_length)(v);
1398 else if (Py_TYPE(v)->tp_as_sequence != NULL &&
1399 Py_TYPE(v)->tp_as_sequence->sq_length != NULL)
1400 res = (*Py_TYPE(v)->tp_as_sequence->sq_length)(v);
1401 else
1402 return 1;
1403 /* if it is negative, it should be either -1 or -2 */
1404 return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
1405}
(Sauce)
Again, because v
– which is item
– is 0
and of type PyLongObject
, we know that specifically, lines 1392-1394 in the logic above apply:
// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
Py_TYPE(v)->tp_as_number->nb_bool != NULL)
res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
In this case, because our item
is indeed 0
, we know that res
is set to 0
as well. Then, we skip to the bottom of this method, line 1404:
return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
res
is set to 0
so the return value of this function is not 1
but instead the safely downcasted value of 0
.
(If you’re interested, you can find the definition of Py_SAFE_DOWNCAST
here. Super interestingly, it turns out the this downcasting action isn’t actually safe unless in debug mode, this issue has proposed renaming the method since py3.4!)
Ok! So PyObject_IsTrue
as it is invoked in line 344 in the builtin_all
method has returned 0
. Therefore we now know that cmp
in line 344 is 0
:
340for (;;) {
341 item = iternext(it);
342 if (item == NULL)
343 break;
344 cmp = PyObject_IsTrue(item);
345 Py_DECREF(item);
346 if (cmp < 0) {
347 Py_DECREF(it);
348 return NULL;
349 }
350 if (cmp == 0) {
351 Py_DECREF(it);
352 Py_RETURN_FALSE;
353 }
354}
Therefore, the code skips to line 350, where we evaluate the condition where cmp == 0
and return Py_RETURN_FALSE
, a “safe” false return value.
And so, in conclusion, our intial call to all
:
all([0,0,0])
# False
returns False as displayed above.
Tada!
(PS: wondering what the difference is between Py_RETURN_FALSE
and Py_False
? I was too! These docs were very helpful in clarifying)
Final remarks
So here’s a fun question to ask: why??
As in - why do this, even? I came for the possibility/thrill of contributing to python (a lang I have used+abused for a very long time to make a living) but I stayed for the learnings and…possibly new ideas for contribution??
While I have definitely ventured into python source in the past, I usually stayed in python land and rarely looked into the C code. I wasn’t planning to look as far/deep as I did tonight but I am glad that I did as I feel like I have a much better understanding of how at least some parts of python now work “under the hood”.
I wrote this post for my own sake mainly but also in the hopes that perhaps others may find my spelunking useful/interesting and yes, perhaps even fun!
_(PS: I wrote a follow up post to the ideas presented here where I build python from source and implement an addition to the builtin modules (that are written in C). Find the post here!)
Sidebar: WTF is tp_bool???
Ok this really bugged me for a while. I’m pretty sure I know what is going on but I’ll start from the top.
As I poured over the python source code, I had built up a mental model of how the sub-slots generally worked. Let’s go back to our friend, PyNumberMethods float_as_number
:
static PyNumberMethods float_as_number = {
float_add, /* nb_add */
float_sub, /* nb_subtract */
float_mul, /* nb_multiply */
float_rem, /* nb_remainder */
float_divmod, /* nb_divmod */
float_pow, /* nb_power */
(unaryfunc)float_neg, /* nb_negative */
float_float, /* nb_positive */
(unaryfunc)float_abs, /* nb_absolute */
(inquiry)float_bool, /* nb_bool */
0, /* nb_invert */
0, /* nb_lshift */
0, /* nb_rshift */
0, /* nb_and */
0, /* nb_xor */
0, /* nb_or */
float___trunc___impl, /* nb_int */
0, /* nb_reserved */
float_float, /* nb_float */
0, /* nb_inplace_add */
0, /* nb_inplace_subtract */
0, /* nb_inplace_multiply */
0, /* nb_inplace_remainder */
0, /* nb_inplace_power */
0, /* nb_inplace_lshift */
0, /* nb_inplace_rshift */
0, /* nb_inplace_and */
0, /* nb_inplace_xor */
0, /* nb_inplace_or */
float_floor_div, /* nb_floor_divide */
float_div, /* nb_true_divide */
0, /* nb_inplace_floor_divide */
0, /* nb_inplace_true_divide */
};
(Sauce)
Take note especially of:
// ...
(inquiry)float_bool, /* nb_bool */
Now, consider the equivalent but for PyNumberMethods long_as_number
:
static PyNumberMethods long_as_number = {
(binaryfunc)long_add, /*nb_add*/
(binaryfunc)long_sub, /*nb_subtract*/
(binaryfunc)long_mul, /*nb_multiply*/
long_mod, /*nb_remainder*/
long_divmod, /*nb_divmod*/
long_pow, /*nb_power*/
(unaryfunc)long_neg, /*nb_negative*/
long_long, /*tp_positive*/
(unaryfunc)long_abs, /*tp_absolute*/
(inquiry)long_bool, /*tp_bool*/
(unaryfunc)long_invert, /*nb_invert*/
long_lshift, /*nb_lshift*/
long_rshift, /*nb_rshift*/
long_and, /*nb_and*/
long_xor, /*nb_xor*/
long_or, /*nb_or*/
long_long, /*nb_int*/
0, /*nb_reserved*/
long_float, /*nb_float*/
0, /* nb_inplace_add */
0, /* nb_inplace_subtract */
0, /* nb_inplace_multiply */
0, /* nb_inplace_remainder */
0, /* nb_inplace_power */
0, /* nb_inplace_lshift */
0, /* nb_inplace_rshift */
0, /* nb_inplace_and */
0, /* nb_inplace_xor */
0, /* nb_inplace_or */
long_div, /* nb_floor_divide */
long_true_divide, /* nb_true_divide */
0, /* nb_inplace_floor_divide */
0, /* nb_inplace_true_divide */
long_long, /* nb_index */
};
(Sauce)
But note specifically:
// ...
(inquiry)long_bool, /*tp_bool*/
Wtf is tp_bool
?!
This is likely due to my own ignorance/tiredness but I searched frantically for some definition, any definition, of tp_bool
in the source code. Nothing. Tried the docs. Nada.
Finally, I realized that I could probably just look at the PyNumberMethods
definition which led me to:
typedef struct {
/* Number implementations must check *both*
arguments for proper type and implement the necessary conversions
in the slot functions themselves. */
// ... SKIPPING non relevant lines
inquiry nb_bool;
// ... SKIPPING non relevant lines
} PyNumberMethods;
(Sauce)
From this definition, it became clear that the tp_bool
label is just a typo. Whomp, whomp.
Probably not important enough for a PR but man did it confuse me for a while!