TAGS: python

Grokking Builtin.all in Python

March 16, 2021

A coworker shared this neat code snippet on slack:

>>> all([0, 0, 0])
False
>>> all([1, 1, 1])
True

and mused to us all:

this is annoying
– venerated colleague, circa 2021

Indeed he’s right! It is annoying. His note piqued my curiousity as to why/how this worked under the hood; in this post I take a dive into the guts of cpython searching for answers.

(EDIT: heads up, I wrote a lot of words here. If you want the simpler walk through without the fancy code spelunking, click here to skip to the final section)

First, a sensible alternative

Let’s get a few things out of the way before we go on.

(NOTE: Neither my colleague nor I actually knew about the formal definition of all - so we expected all([0,0,0]) to indeed return True and particularly, I was under the impression that this might actually be a bug in python’s implementation of all(..))

First and foremost - let’s explain why, naively, all([0,0,0]) returning False (given our misconception of expected behavior) is in fact not entirely unexpected.

In python, there is a concept of truthy and falsy. While there exists explicitly defined True and False values (of type bool as it were), non boolean types are also evaluated by python to True or False within boolean contexts.

A “boolean context”, fwiw, looks like this:

some_value = "not a boolean, lol"
if some_value:
    pass

# or
while some_value:
    pass

In the example above, some_value is clearly a str yet when used in an if (or while!) statement block, it is evaluated as True!

In general, all values in python are “truthy” unless they aren’t, in which case, they are “falsy”. If you can imagine, there are only a few explicit values that are defined as “falsy”, find them enumerated here:

[] # empty list
{} # empty dict
() # empty tuple
set() # empty set
"" # empty string
range(0) # empty range
0 # int, 0
0.0 # float, 0
None # nonetype, obvi
False # boolean false

(I had the hardest time finding a definitive list of these items on the interwebs. The closest I could find that seemed “official” was this py2.4(!!) “Python Library Reference”. The list above is from this freecodecamp article)

Ok cool - so with this clarification, it is obvious why all([0,0,0]) evals to False. 0 is falsy therefore it is at least plausible to surmise that there is a naive if statement somewhere that fails to account for this edge case.

Moreover, it stands to reason that all of the following will also probably eval to False:

all([False, False, False])
all([{}, {}, {})
# .. etc, you get the idea

Ok great - having established this, let’s take to the official python3 (at time of this writing) docs:

all(iterable)
Return True if all elements of the iterable are true (or if the iterable is empty).

The docs also present this “equivalent” implementation:

def all(iterable):
    for element in iterable:
        if not element:
            return False
    return True

(Heads up, in part 2 of this post here, we actually formalize this logic and tack it on to python’s source!)

From the documentation, two things are obvious to me:

1: Using all([0,0,0]) to determine if all members of an iterable are equal is kind of a bad usecase. The purpose of the all method is only to determine if all the values in an iterable are truthy. That’s it. Full stop. (But also, like, TIL)

2: If you do want to determine whether all the values in an iterable are indeed the same value, consider:

len(set([0,0,0])) == 1

(Note: this will not work for something like [{}, {}, {}] since dicts are not hashable. However, for basic usecases where we want to (safely!) determine if a list of primitive types are all the same, the approach is IMO “good enough”)

(Note 2: if you really want to handle all cases, including unhashable types, consider the following:

inp = [{},{},{}]
samesame = True
for it in inp:
  samesame = inp[0] == it
  if not samesame:
    break
# samesame will be False is all items in inp are not the same val
print(samesame) # True

… certainly not as compact but gets the job done!)

Venturing into the guts of CPython

Ok, now for the meat of this story.

Let me preface this by saying: I really enjoy venturing into source code to validate behavior that I am seeing. More often than not, this is fueled by (perhaps an unreasonable…) hope that maybe I’ll notice something undocumented or interesting that could be helpful immediately or sometime in the future.

For this specific case though, I was wondering how easy/hard it may be to submit a PR to rectify this behavior of all (spoiler: not all that straight forward and more importantly, definitely not appropriate given the officially documented behavior we saw above)

Still, this was an interesting exercise and worth documenting for fun and profit (…mainly fun, though)

I started by searching for all within the cpython repo. Generally Github’s code searching functionality is pretty great - the main issue here was that all is fairly generic word that is used often both in and outside of the code (like in documentation and such).

This stackoverflow suggested that builtin funcs could be found in the Object folder:

However, many of the built-in types can be found in the Objects sub-directory of the Python source trunk. For example, see here for the implementation of the enumerate class or here for the implementation of the list type.

Once I started digging into the src code, I realized that there is a lot of code (surprise, surprise) in the cpython codebase. So I cloned the entire project to my local machine and started grepping the heck out of…everything in the project folder.

➜  cpython git:(master) grep -r "any(" .

Running the ^ above (I figured any, a similar method to all might be a less frequently used token than all…) I ended up with a ton of results. I manually scanned through each line item in the output (anyone have a better way to search for things like this? I couldn’t think of anything useful but admittedly I didn’t try too hard) I ended up here:

./Python/bltinmodule.c:builtin_any(PyObject *module, PyObject *iterable)

That looked interesting! Looking at the func definition (ok, one definition above builtin_any - as I mentioned, I looked for any but I wanted the src for all):

319/*[clinic input]
320all as builtin_all
321    iterable: object
322    /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331    PyObject *it, *item;
332    PyObject *(*iternext)(PyObject *);
333    int cmp;
334
335    it = PyObject_GetIter(iterable);
336    if (it == NULL)
337        return NULL;
338    iternext = *Py_TYPE(it)->tp_iternext;
339
340    for (;;) {
341        item = iternext(it);
342        if (item == NULL)
343            break;
344        cmp = PyObject_IsTrue(item);
345        Py_DECREF(item);
346        if (cmp < 0) {
347            Py_DECREF(it);
348            return NULL;
349        }
350        if (cmp == 0) {
351            Py_DECREF(it);
352            Py_RETURN_FALSE;
353        }
354    }
355    Py_DECREF(it);
356    if (PyErr_Occurred()) {
357        if (PyErr_ExceptionMatches(PyExc_StopIteration))
358            PyErr_Clear();
359        else
360            return NULL;
361    }
362    Py_RETURN_TRUE;
363}

(Sauce)

We found it! This is the actual implementation of the builtin all method!!

Ok cool, so let’s examine it. The most interesting to us are the lines from 340 onwards:

340    for (;;) {
341        item = iternext(it);
342        if (item == NULL)
343            break;
344        cmp = PyObject_IsTrue(item);
345        Py_DECREF(item);
346        if (cmp < 0) {
347            Py_DECREF(it);
348            return NULL;
349        }
350        if (cmp == 0) {
351            Py_DECREF(it);
352            Py_RETURN_FALSE;
353        }
354    }
355    Py_DECREF(it);
356    if (PyErr_Occurred()) {
357        if (PyErr_ExceptionMatches(PyExc_StopIteration))
358            PyErr_Clear();
359        else
360            return NULL;
361    }
362    Py_RETURN_TRUE;

This is the main loop that walks through each item in the iterable and applies a bunch of logic to validate if it is true or not. To understand what true means, we consider line 344, which invokes the PyObject_IsTrue method. Using a similar approach to what I described earlier, I was able to track down the definition of the PyObject_IsTrue method, reproduced below:

1379/* Test a value used as condition, e.g., in a while or if statement.
1380   Return -1 if an error occurred */
1381
1382int
1383PyObject_IsTrue(PyObject *v)
1384{
1385    Py_ssize_t res;
1386    if (v == Py_True)
1387        return 1;
1388    if (v == Py_False)
1389        return 0;
1390    if (v == Py_None)
1391        return 0;
1392    else if (Py_TYPE(v)->tp_as_number != NULL &&
1393             Py_TYPE(v)->tp_as_number->nb_bool != NULL)
1394        res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1395    else if (Py_TYPE(v)->tp_as_mapping != NULL &&
1396             Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
1397        res = (*Py_TYPE(v)->tp_as_mapping->mp_length)(v);
1398    else if (Py_TYPE(v)->tp_as_sequence != NULL &&
1399             Py_TYPE(v)->tp_as_sequence->sq_length != NULL)
1400        res = (*Py_TYPE(v)->tp_as_sequence->sq_length)(v);
1401    else
1402        return 1;
1403    /* if it is negative, it should be either -1 or -2 */
1404    return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
1405}

(Sauce)

So this method, as it turns out, isn’t all that complex. It does a bunch of checks against v a generic type called PyObject that, recall from the previous builtin_all method, is item – a member of our iterable passed in to all.

Let’s examine this PyObject_IsTrue method line by line:

First, we inspect this (line 1386):

// ...
if (v == Py_True)
    return 1;

Recall that v for us is just 0. So this condition does not match; 0 is not True. Onwards!

The next two conditions (lines 1388-1391):

// ...
if (v == Py_False)
    return 0;
if (v == Py_None)
    return 0;

also do not match. v is 0 not False nor None.

The next check is key (lines 1392-1394, for this exercise, at least).

// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
            Py_TYPE(v)->tp_as_number->nb_bool != NULL)
    res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);

Woof. I’m not super familiar with C so this code block looks especially foreign to me.

I tried figuring out what the definition of Py_TYPE might be – but unfortunately didn’t get too far in my grepping adventures. So, I turned again to the python docs (…by googling, of course).

The Type Objects documentation had a lot of good info that helped clear up exactly what this particular line means:

Py_TYPE(v)->tp_as_number

Based on my interpretation of the documentation + the preamble from this helpful blog post, all types in python are instances of the C struct _typeobject which determines how a type behaves. tp_as_number is a member (bascially an element in the struct) of the _typeobject struct. Only types that are numerical or support numerical usage (ie: floats or bools since they can be used as 0/1, etc) have this member defined and NOT set to NULL.

To further clarify this, let’s take another dive into the code. We begin by hunting down the definition of the _typeobject, which is graciously reproduced for our benefit in the Type Objects docs I mentioned above. I am re-producting it here for convenience but abridging all but the relevant members:

 1typedef struct _typeobject {
 2    // ... !!! removing since not needed for our analysis
 3
 4    /* Method suites for standard classes */
 5
 6    PyNumberMethods *tp_as_number;
 7
 8    // ... !!! removing since not needed for our analysis
 9
10} PyTypeObject;

(Sauce)

The main hint here for me was PyTypeObject; by grepping the codebase for this token, I was able to locate the following PyTypeObject:

2778PyTypeObject PyZip_Type = {
2779    PyVarObject_HEAD_INIT(&PyType_Type, 0)
2780    "zip",                              /* tp_name */
2781    sizeof(zipobject),                  /* tp_basicsize */
2782    0,                                  /* tp_itemsize */
2783    /* methods */
2784    (destructor)zip_dealloc,            /* tp_dealloc */
2785    0,                                  /* tp_vectorcall_offset */
2786    0,                                  /* tp_getattr */
2787    0,                                  /* tp_setattr */
2788    0,                                  /* tp_as_async */
2789    0,                                  /* tp_repr */
2790    0,                                  /* tp_as_number */
2791    0,                                  /* tp_as_sequence */
2792    0,                                  /* tp_as_mapping */
2793    0,                                  /* tp_hash */
2794    0,                                  /* tp_call */
2795    0,                                  /* tp_str */
2796    PyObject_GenericGetAttr,            /* tp_getattro */
2797    0,                                  /* tp_setattro */
2798    0,                                  /* tp_as_buffer */
2799    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
2800        Py_TPFLAGS_BASETYPE,            /* tp_flags */
2801    zip_doc,                            /* tp_doc */
2802    (traverseproc)zip_traverse,    /* tp_traverse */
2803    0,                                  /* tp_clear */
2804    0,                                  /* tp_richcompare */
2805    0,                                  /* tp_weaklistoffset */
2806    PyObject_SelfIter,                  /* tp_iter */
2807    (iternextfunc)zip_next,     /* tp_iternext */
2808    zip_methods,                        /* tp_methods */
2809    0,                                  /* tp_members */
2810    0,                                  /* tp_getset */
2811    0,                                  /* tp_base */
2812    0,                                  /* tp_dict */
2813    0,                                  /* tp_descr_get */
2814    0,                                  /* tp_descr_set */
2815    0,                                  /* tp_dictoffset */
2816    0,                                  /* tp_init */
2817    PyType_GenericAlloc,                /* tp_alloc */
2818    zip_new,                            /* tp_new */
2819    PyObject_GC_Del,                    /* tp_free */
2820};

(Sauce)

Line 2790 is the line we care about: note how the value is set to 0 here. But what is PyZip_Type anyways? In that same file (but further down), we observe:

2917#define SETBUILTIN(NAME, OBJECT) \
2918    if (PyDict_SetItemString(dict, NAME, (PyObject *)OBJECT) < 0)       \
2919        return NULL;                                                    \
2920    ADD_TO_ALL(OBJECT)
2921
2922    SETBUILTIN("None",                  Py_None);
2923    SETBUILTIN("Ellipsis",              Py_Ellipsis);
2924    SETBUILTIN("NotImplemented",        Py_NotImplemented);
2925    SETBUILTIN("False",                 Py_False);
2926    SETBUILTIN("True",                  Py_True);
2927    SETBUILTIN("bool",                  &PyBool_Type);
2928    SETBUILTIN("memoryview",        &PyMemoryView_Type);
2929    SETBUILTIN("bytearray",             &PyByteArray_Type);
2930    SETBUILTIN("bytes",                 &PyBytes_Type);
2931    SETBUILTIN("classmethod",           &PyClassMethod_Type);
2932    SETBUILTIN("complex",               &PyComplex_Type);
2933    SETBUILTIN("dict",                  &PyDict_Type);
2934    SETBUILTIN("enumerate",             &PyEnum_Type);
2935    SETBUILTIN("filter",                &PyFilter_Type);
2936    SETBUILTIN("float",                 &PyFloat_Type);
2937    SETBUILTIN("frozenset",             &PyFrozenSet_Type);
2938    SETBUILTIN("property",              &PyProperty_Type);
2939    SETBUILTIN("int",                   &PyLong_Type);
2940    SETBUILTIN("list",                  &PyList_Type);
2941    SETBUILTIN("map",                   &PyMap_Type);
2942    SETBUILTIN("object",                &PyBaseObject_Type);
2943    SETBUILTIN("range",                 &PyRange_Type);
2944    SETBUILTIN("reversed",              &PyReversed_Type);
2945    SETBUILTIN("set",                   &PySet_Type);
2946    SETBUILTIN("slice",                 &PySlice_Type);
2947    SETBUILTIN("staticmethod",          &PyStaticMethod_Type);
2948    SETBUILTIN("str",                   &PyUnicode_Type);
2949    SETBUILTIN("super",                 &PySuper_Type);
2950    SETBUILTIN("tuple",                 &PyTuple_Type);
2951    SETBUILTIN("type",                  &PyType_Type);
2952    SETBUILTIN("zip",                   &PyZip_Type);
2953    debug = PyBool_FromLong(config->optimization_level == 0);
2954    if (PyDict_SetItemString(dict, "__debug__", debug) < 0) {
2955        Py_DECREF(debug);
2956        return NULL;
2957    }
2958    Py_DECREF(debug);
2959
2960    return mod;
2961#undef ADD_TO_ALL
2962#undef SETBUILTIN

(Sauce)

This provides strong evidence that the python builtin definition for zip() (line 2952, above) is actually specified by PyZip_Type. As we know, we cannot use zip() in any numerical contexts which explains why tp_as_number is set to 0 (in C, null is a constant with value of 0 (sauce)).

But let’s be sure…consider (randomly picked) the float type – from line 2936:

SETBUILTIN("float",                 &PyFloat_Type);

If we were to peer into PyFloat_Type, we would expect to see tp_as_number defined - not as 0 - but as a PyNumberMethods struct.

We locate PyFloat_Type and upon inspection:

1919PyTypeObject PyFloat_Type = {
1920    PyVarObject_HEAD_INIT(&PyType_Type, 0)
1921    "float",
1922    sizeof(PyFloatObject),
1923    0,
1924    (destructor)float_dealloc,                  /* tp_dealloc */
1925    0,                                          /* tp_vectorcall_offset */
1926    0,                                          /* tp_getattr */
1927    0,                                          /* tp_setattr */
1928    0,                                          /* tp_as_async */
1929    (reprfunc)float_repr,                       /* tp_repr */
1930    &float_as_number,                           /* tp_as_number */
1931    0,                                          /* tp_as_sequence */
1932    0,                                          /* tp_as_mapping */
1933    (hashfunc)float_hash,                       /* tp_hash */
1934    0,                                          /* tp_call */
1935    0,                                          /* tp_str */
1936    PyObject_GenericGetAttr,                    /* tp_getattro */
1937    0,                                          /* tp_setattro */
1938    0,                                          /* tp_as_buffer */
1939    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,   /* tp_flags */
1940    float_new__doc__,                           /* tp_doc */
1941    0,                                          /* tp_traverse */
1942    0,                                          /* tp_clear */
1943    float_richcompare,                          /* tp_richcompare */
1944    0,                                          /* tp_weaklistoffset */
1945    0,                                          /* tp_iter */
1946    0,                                          /* tp_iternext */
1947    float_methods,                              /* tp_methods */
1948    0,                                          /* tp_members */
1949    float_getset,                               /* tp_getset */
1950    0,                                          /* tp_base */
1951    0,                                          /* tp_dict */
1952    0,                                          /* tp_descr_get */
1953    0,                                          /* tp_descr_set */
1954    0,                                          /* tp_dictoffset */
1955    0,                                          /* tp_init */
1956    0,                                          /* tp_alloc */
1957    float_new,                                  /* tp_new */
1958    .tp_vectorcall = (vectorcallfunc)float_vectorcall,
1959};

(Sauce)

Line number 1930, happily, proves us right! tp_as_number is a pointer to float_as_number, which is defined right above:

1883static PyNumberMethods float_as_number = {
1884    float_add,          /* nb_add */
1885    float_sub,          /* nb_subtract */
1886    float_mul,          /* nb_multiply */
1887    float_rem,          /* nb_remainder */
1888    float_divmod,       /* nb_divmod */
1889    float_pow,          /* nb_power */
1890    (unaryfunc)float_neg, /* nb_negative */
1891    float_float,        /* nb_positive */
1892    (unaryfunc)float_abs, /* nb_absolute */
1893    (inquiry)float_bool, /* nb_bool */
1894    0,                  /* nb_invert */
1895    0,                  /* nb_lshift */
1896    0,                  /* nb_rshift */
1897    0,                  /* nb_and */
1898    0,                  /* nb_xor */
1899    0,                  /* nb_or */
1900    float___trunc___impl, /* nb_int */
1901    0,                  /* nb_reserved */
1902    float_float,        /* nb_float */
1903    0,                  /* nb_inplace_add */
1904    0,                  /* nb_inplace_subtract */
1905    0,                  /* nb_inplace_multiply */
1906    0,                  /* nb_inplace_remainder */
1907    0,                  /* nb_inplace_power */
1908    0,                  /* nb_inplace_lshift */
1909    0,                  /* nb_inplace_rshift */
1910    0,                  /* nb_inplace_and */
1911    0,                  /* nb_inplace_xor */
1912    0,                  /* nb_inplace_or */
1913    float_floor_div,    /* nb_floor_divide */
1914    float_div,          /* nb_true_divide */
1915    0,                  /* nb_inplace_floor_divide */
1916    0,                  /* nb_inplace_true_divide */
1917};

(Sauce)

Tada! Our assumptions are correct and indeed the check:

// ...
else if (Py_TYPE(v)->tp_as_number != NULL && // ...

is mainly checking to ensure that v, our item in question, has numerical properties as defined by the PyNumberMethods struct. If so, we then go on to check and ensure that:

Py_TYPE(v)->tp_as_number->nb_bool != NULL

which leads us to our second big question: what the hell is nb_bool??

To answer this question - thankfully - we don’t need to look all that far. But before looking into the code, let’s revisit out trusty docs one last time. The “Quick Reference” section (here) displays a bunch of “slots” that are defined in the _typeobject struct:

tp slots

Clicking on the sub-slots link under the special methods/attrs column for tp_as_number leads us to:

sub slots

Generally, these “subslots” display a variety of C struct elements and the corresponding python “special method” that it relates to. In particular and most useful to note is nb_bool, which relates to the python __bool__ dunder method! So basically,

Py_TYPE(v)->tp_as_number->nb_bool != NULL

is simply checking to ensure that the Py_TYPE has a valid __bool__ method defined!

With thiat, let’s now look at line 1893 from our static PyNumberMethods float_as_number definition, which illustrates that for PyFloat_Type at least, the nb_bool slot is defined as float_bool:

// ...
(inquiry)float_bool, /* nb_bool */

and the definition of float_bool is available in the same floatobject.c file that defines PyFloat_Type:

static int
float_bool(PyFloatObject *v)
{
    return v->ob_fval != 0.0;
}

(Sauce)

On initial glance, the utility of this method is obvious - return True if ob_fval is not 0.0. Otherwise, return False.

But now on to a new question: what the heck is v->ob_fval?? If we were to guess, it is probably the actual numerical value of our variable v of type PyFloatObject (like for instance the number 3.14159 that is stored as type float).

Confirming this is easy - just look at the header file for floatobject.c:

15typedef struct {
16    PyObject_HEAD
17    double ob_fval;
18} PyFloatObject;

(Sauce)

This clearly demonstrates that PyFloatObject has a property called ob_fval which is of type double. In float_bool(PyFloatObject *v), we compare this value against 0.0 to return a boolean that determines the “truthy” or “falsy” –ness of our value.

Moreover, it is clear that for other Py_TYPEs out there (like ints or bools for instance), we can repeat this exercise and end up again at some function that represents nb_bool (corresponding to the __bool__ dunder method) which defines the logic that determines if the underlying value is “truthy” or “falsy”. For example, for python’s range builtin (defined as PyRange_Type), the nb_bool method is defined as:

680static int
681range_bool(rangeobject* self)
682{
683    return PyObject_IsTrue(self->length);
684}

(Sauce)

In other words, it generally works the same way as float_bool does - a static method that returns an int is associated with the nb_bool slot but of course implementation details are determined by and specific to the characteristics of the type in question (PyRange_Type, PyFloat_Type, etc)

And SO, finally, we can fully parse our initial code block and explain what it is doing (in the context of all([0,0,0])):

// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
            Py_TYPE(v)->tp_as_number->nb_bool != NULL)
    res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);

1: check to see if v has numerical methods defined

2: also ensure that a proper __bool__ method is available for v

FINALLY: if (1) and (2) are both NOT NULL, then call the __bool__ method on v, which will set res to be 1, 0 or some numeric representation of error (in other words, True or False).

At this point, it becomes painfully obvious as to why we get the False return value that we have observed for our initial example of all([0,0,0]).

(Note: I sneakily did not include the actual nb_bool implementation for the python int type, which is defined in the PyLong_Type struct. This is mainly because for ints at least, the definition/code path is a little less obvious (but works the same way in principle) so for the purposes of clarity I chose to go with PyFloat_Type which IMO is easier to follow)

Tying it all together, finally!

Still, for the sake of completeness, let’s see this thought process through to completion. I’ll walk through the code path and expound on what (I am fairly sure at this point) is happening at each step.

all([0,0,0])
# ??? -- we want to replace this "???" with True or False

The builtin method all is defined here:

#define BUILTIN_ALL_METHODDEF    \
    {"all", (PyCFunction)builtin_all, METH_O, builtin_all__doc__},

(Sauce)

this is how the method builtin_all in associated the the python func all(). Next, we end up in the function definition for builtin_all, reproducing here for convenience:

319/*[clinic input]
320all as builtin_all
321    iterable: object
322    /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331    PyObject *it, *item;
332    PyObject *(*iternext)(PyObject *);
333    int cmp;
334
335    it = PyObject_GetIter(iterable);
336    if (it == NULL)
337        return NULL;
338    iternext = *Py_TYPE(it)->tp_iternext;
339
340    for (;;) {
341        item = iternext(it);
342        if (item == NULL)
343            break;
344        cmp = PyObject_IsTrue(item);
345        Py_DECREF(item);
346        if (cmp < 0) {
347            Py_DECREF(it);
348            return NULL;
349        }
350        if (cmp == 0) {
351            Py_DECREF(it);
352            Py_RETURN_FALSE;
353        }
354    }
355    Py_DECREF(it);
356    if (PyErr_Occurred()) {
357        if (PyErr_ExceptionMatches(PyExc_StopIteration))
358            PyErr_Clear();
359        else
360            return NULL;
361    }
362    Py_RETURN_TRUE;
363}

(Sauce)

On line 341, we produce our first item – 0 since our input into the original all() function was [0,0,0].

On line 344, we invoke PyObject_IsTrue for 0, which is a PyLongObject (int, basically).

Ok, so let’s reproduce PyObject_IsTrue below (for convenience) and trace our item as it traverses the control flow structure:

1379/* Test a value used as condition, e.g., in a while or if statement.
1380   Return -1 if an error occurred */
1381
1382int
1383PyObject_IsTrue(PyObject *v)
1384{
1385    Py_ssize_t res;
1386    if (v == Py_True)
1387        return 1;
1388    if (v == Py_False)
1389        return 0;
1390    if (v == Py_None)
1391        return 0;
1392    else if (Py_TYPE(v)->tp_as_number != NULL &&
1393             Py_TYPE(v)->tp_as_number->nb_bool != NULL)
1394        res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1395    else if (Py_TYPE(v)->tp_as_mapping != NULL &&
1396             Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
1397        res = (*Py_TYPE(v)->tp_as_mapping->mp_length)(v);
1398    else if (Py_TYPE(v)->tp_as_sequence != NULL &&
1399             Py_TYPE(v)->tp_as_sequence->sq_length != NULL)
1400        res = (*Py_TYPE(v)->tp_as_sequence->sq_length)(v);
1401    else
1402        return 1;
1403    /* if it is negative, it should be either -1 or -2 */
1404    return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
1405}

(Sauce)

Again, because v – which is item – is 0 and of type PyLongObject, we know that specifically, lines 1392-1394 in the logic above apply:

// ...
else if (Py_TYPE(v)->tp_as_number != NULL &&
            Py_TYPE(v)->tp_as_number->nb_bool != NULL)
    res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);

In this case, because our item is indeed 0, we know that res is set to 0 as well. Then, we skip to the bottom of this method, line 1404:

return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);

res is set to 0 so the return value of this function is not 1 but instead the safely downcasted value of 0.

(If you’re interested, you can find the definition of Py_SAFE_DOWNCAST here. Super interestingly, it turns out the this downcasting action isn’t actually safe unless in debug mode, this issue has proposed renaming the method since py3.4!)

Ok! So PyObject_IsTrue as it is invoked in line 344 in the builtin_all method has returned 0. Therefore we now know that cmp in line 344 is 0:

340for (;;) {
341    item = iternext(it);
342    if (item == NULL)
343        break;
344    cmp = PyObject_IsTrue(item);
345    Py_DECREF(item);
346    if (cmp < 0) {
347        Py_DECREF(it);
348        return NULL;
349    }
350    if (cmp == 0) {
351        Py_DECREF(it);
352        Py_RETURN_FALSE;
353    }
354}

Therefore, the code skips to line 350, where we evaluate the condition where cmp == 0 and return Py_RETURN_FALSE, a “safe” false return value.

And so, in conclusion, our intial call to all:

all([0,0,0])
# False

returns False as displayed above.

Tada!

(PS: wondering what the difference is between Py_RETURN_FALSE and Py_False? I was too! These docs were very helpful in clarifying)

Final remarks

So here’s a fun question to ask: why??

As in - why do this, even? I came for the possibility/thrill of contributing to python (a lang I have used+abused for a very long time to make a living) but I stayed for the learnings and…possibly new ideas for contribution??

While I have definitely ventured into python source in the past, I usually stayed in python land and rarely looked into the C code. I wasn’t planning to look as far/deep as I did tonight but I am glad that I did as I feel like I have a much better understanding of how at least some parts of python now work “under the hood”.

I wrote this post for my own sake mainly but also in the hopes that perhaps others may find my spelunking useful/interesting and yes, perhaps even fun!

_(PS: I wrote a follow up post to the ideas presented here where I build python from source and implement an addition to the builtin modules (that are written in C). Find the post here!)

Ok this really bugged me for a while. I’m pretty sure I know what is going on but I’ll start from the top.

As I poured over the python source code, I had built up a mental model of how the sub-slots generally worked. Let’s go back to our friend, PyNumberMethods float_as_number:

static PyNumberMethods float_as_number = {
    float_add,          /* nb_add */
    float_sub,          /* nb_subtract */
    float_mul,          /* nb_multiply */
    float_rem,          /* nb_remainder */
    float_divmod,       /* nb_divmod */
    float_pow,          /* nb_power */
    (unaryfunc)float_neg, /* nb_negative */
    float_float,        /* nb_positive */
    (unaryfunc)float_abs, /* nb_absolute */
    (inquiry)float_bool, /* nb_bool */
    0,                  /* nb_invert */
    0,                  /* nb_lshift */
    0,                  /* nb_rshift */
    0,                  /* nb_and */
    0,                  /* nb_xor */
    0,                  /* nb_or */
    float___trunc___impl, /* nb_int */
    0,                  /* nb_reserved */
    float_float,        /* nb_float */
    0,                  /* nb_inplace_add */
    0,                  /* nb_inplace_subtract */
    0,                  /* nb_inplace_multiply */
    0,                  /* nb_inplace_remainder */
    0,                  /* nb_inplace_power */
    0,                  /* nb_inplace_lshift */
    0,                  /* nb_inplace_rshift */
    0,                  /* nb_inplace_and */
    0,                  /* nb_inplace_xor */
    0,                  /* nb_inplace_or */
    float_floor_div,    /* nb_floor_divide */
    float_div,          /* nb_true_divide */
    0,                  /* nb_inplace_floor_divide */
    0,                  /* nb_inplace_true_divide */
};

(Sauce)

Take note especially of:

// ...
(inquiry)float_bool, /* nb_bool */

Now, consider the equivalent but for PyNumberMethods long_as_number:

static PyNumberMethods long_as_number = {
    (binaryfunc)long_add,       /*nb_add*/
    (binaryfunc)long_sub,       /*nb_subtract*/
    (binaryfunc)long_mul,       /*nb_multiply*/
    long_mod,                   /*nb_remainder*/
    long_divmod,                /*nb_divmod*/
    long_pow,                   /*nb_power*/
    (unaryfunc)long_neg,        /*nb_negative*/
    long_long,                  /*tp_positive*/
    (unaryfunc)long_abs,        /*tp_absolute*/
    (inquiry)long_bool,         /*tp_bool*/
    (unaryfunc)long_invert,     /*nb_invert*/
    long_lshift,                /*nb_lshift*/
    long_rshift,                /*nb_rshift*/
    long_and,                   /*nb_and*/
    long_xor,                   /*nb_xor*/
    long_or,                    /*nb_or*/
    long_long,                  /*nb_int*/
    0,                          /*nb_reserved*/
    long_float,                 /*nb_float*/
    0,                          /* nb_inplace_add */
    0,                          /* nb_inplace_subtract */
    0,                          /* nb_inplace_multiply */
    0,                          /* nb_inplace_remainder */
    0,                          /* nb_inplace_power */
    0,                          /* nb_inplace_lshift */
    0,                          /* nb_inplace_rshift */
    0,                          /* nb_inplace_and */
    0,                          /* nb_inplace_xor */
    0,                          /* nb_inplace_or */
    long_div,                   /* nb_floor_divide */
    long_true_divide,           /* nb_true_divide */
    0,                          /* nb_inplace_floor_divide */
    0,                          /* nb_inplace_true_divide */
    long_long,                  /* nb_index */
};

(Sauce)

But note specifically:

// ...
(inquiry)long_bool,         /*tp_bool*/

Wtf is tp_bool?!

This is likely due to my own ignorance/tiredness but I searched frantically for some definition, any definition, of tp_bool in the source code. Nothing. Tried the docs. Nada.

Finally, I realized that I could probably just look at the PyNumberMethods definition which led me to:

typedef struct {
    /* Number implementations must check *both*
       arguments for proper type and implement the necessary conversions
       in the slot functions themselves. */

    // ... SKIPPING non relevant lines

    inquiry nb_bool;

    // ... SKIPPING non relevant lines
    
} PyNumberMethods;

(Sauce)

From this definition, it became clear that the tp_bool label is just a typo. Whomp, whomp.

Probably not important enough for a PR but man did it confuse me for a while!

Grokking Builtin.all in Python

First, a sensible alternative

Venturing into the guts of CPython

Tying it all together, finally!

Final remarks

Sidebar: WTF is tp_bool???

Share