TAGS: python

Extending Pythons Builtin C Modules

This post is intended to be a follow-up/continuation of my discussion around cpython and grokking (generally speaking) how all(...) works.

(For better context, I’d recommend first reading the previous post in this series, “Grokking Builtin.all in Python”)

Having understood how the builtin_all method behaves, we now turn our attention to how we might be able to either “hack” builtin_all to accommodate our intended use case (for no reason except for our own education) or more generally speaking, how we might define a new method that specifically checks to see if all items in the iterable are equivalent.

Effectively, we want to implement this logic (from the previous post):

inp = [{},{},{}]
samesame = True
for it in inp:
  samesame = inp[0] == it
  if not samesame:
    break
# samesame will be False is all items in inp are not the same val
print(samesame) # True

but now formalized as a method in python’s bltinmodule.c lib such that calling our method from python itself gives us:

samesame([0,0,0]) # True
samesame([1,0,0]) # False
samesame([0,1,0]) # False
samesame([0,0.0,0]) # False
samesame([]) # True
samesame([{},{},{}]) # True
samesame([{},{},set()]) # False

Ok?

Ok.

Let’s do this!

Step 1: Building CPython

Our first step is to actually build cpython locally so that we can make changes to the C source code and observe it in action.

To begin, I followed the steps provided here, as written in the Python Developer’s Guide which defines the steps necessary for building python from the C source code.

While they have exhaustive documentation about how to get the codebase working in various environments and a bunch of caveats for install dependencies, etc - I chose to try to find the simplest way forward I possibly could.

To that end, I only needed to run the following two commands:

./configure --with-pydebug

and then:

make -s -j2

Initially I ran this directly on my local machine…which worked! And I saw that a python.exe file (yes, on a mac) was created in my current working directory. Running ./python.exe resulted in the following:

Python 3.10.0a6+ (heads/master-dirty:6086ae7, Mar 17 2021, 19:52:03) [GCC 4.9.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Particularly, the Python 3.10.0a6+ (heads/master-dirty:6086ae7 label demonstrates that a “dev” build of python was successfully created and is now runnable via the ./python.exe binary!

Step 2: Dockerizing the build process

Let’s step back for a few moments.

The first thing I actually did before even looking up the dev guide docs was to try and find a docker image that builds cpython’s master branch. Disappointingly, I didn’t find anything useful. So, I wrote one (see the Dockerfile below) in the hopes that it will make it easier for others (and of course by others I also include future-Taq) in playing around and/or dev-ing from the source.

FROM gcc:latest

COPY . /usr/src/app

WORKDIR /usr/src/app

RUN ./configure --with-pydebug && make clean && make -s -j2

The first few lines are just our plain old docker usuals - build this image from the gcc:latest base image, move the cpython source into container, set working dir, etc.

The last line builds python from source initially (when image is created). Subsequent runs (once we’ve made changes to the source itself) only need make clean && make -s -j2. (We only need to re-run ./configure if new dependencies must be linked as part of our dev cycle. Also, it is strongly recommended that we always build with --with-pydebug flag enabled).

(TODO: for a future iteration: it would be great to configure some form of watcher (such as CompileDaemon in golang or chokidar in npm) to rerun make as files are changed during development. At that point, I would be comfortable packaging this up as a cpython dev image on dockerhub)

Once the Dockerfile is good and written (I stuck it directly into the top level of the cpython repo), we build the image:

docker build -t pydev:1.0 .

which will install dependencies and run ./configure, etc etc. Then, we run the container:

docker run -it -v ${PWD}/Python:/usr/src/app/Python pydev:1.0 /bin/bash

In this mode, we launch the container in interactive mode allowing us to “manually” run the make clean && make -s -j2 commands after changing source code as needed.

By mounting the Python folder from our host machine into the container, we can make code changes in our IDE of choice (outside the container) but still retain the ability to test the effects of our changes inside the docker container. (Win-win!)

Alternatively, we can also run something like this:

docker run \
    -it \
    -v ${PWD}/Python:/usr/src/app/Python pydev:1.0 \
        bash -c "make clean && make -s -j2 && ./python"

(indented for readability)

In this setup, we make changes to our source (for the purposes of this post, all changes will be made to the ./Python folder) then run the make command to rebuild source and then run the generated python binary (which puts us into the py REPL).

Unfortunately, this needs to be run “manually” everytime we want to debug changes (at least for now, without the file watcher idea in place). There is a lot of opportunities for improvement here but at the very least with the addition of this docker workflow we have a stable way to make changes to the source, mount these changes to our container and run python interactively (in the REPL shell) to test these changes.

Onwards!

Step 3: Lightly dipping our toes into C-land

I’m no C aficionado - so, I began by lightly making small incremental changes to the codebase. I made a change, rebuilt the source via the docker run command above, tested and then repeated the whole process all over again.

To start, I wanted to just prove to myself that I could make visible changes and get the output I expected.

I began with the builtin_all method - reproduced below:

319/*[clinic input]
320all as builtin_all
321    iterable: object
322    /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331    PyObject *it, *item;
332    PyObject *(*iternext)(PyObject *);
333    int cmp;
334
335    it = PyObject_GetIter(iterable);
336    if (it == NULL)
337        return NULL;
338    iternext = *Py_TYPE(it)->tp_iternext;
339
340    for (;;) {
341        item = iternext(it);
342        if (item == NULL)
343            break;
344        cmp = PyObject_IsTrue(item);
345        Py_DECREF(item);
346        if (cmp < 0) {
347            Py_DECREF(it);
348            return NULL;
349        }
350        if (cmp == 0) {
351            Py_DECREF(it);
352            Py_RETURN_FALSE;
353        }
354    }
355    Py_DECREF(it);
356    if (PyErr_Occurred()) {
357        if (PyErr_ExceptionMatches(PyExc_StopIteration))
358            PyErr_Clear();
359        else
360            return NULL;
361    }
362    Py_RETURN_TRUE;
363}

(Sauce)

But! I changed line 352 as follows:

319/*[clinic input]
320all as builtin_all
321    iterable: object
322    /
323Return True if bool(x) is True for all values x in the iterable.
324If the iterable is empty, return True.
325[clinic start generated code]*/
326
327static PyObject *
328builtin_all(PyObject *module, PyObject *iterable)
329/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
330{
331    PyObject *it, *item;
332    PyObject *(*iternext)(PyObject *);
333    int cmp;
334
335    it = PyObject_GetIter(iterable);
336    if (it == NULL)
337        return NULL;
338    iternext = *Py_TYPE(it)->tp_iternext;
339
340    for (;;) {
341        item = iternext(it);
342        if (item == NULL)
343            break;
344        cmp = PyObject_IsTrue(item);
345        Py_DECREF(item);
346        if (cmp < 0) {
347            Py_DECREF(it);
348            return NULL;
349        }
350        if (cmp == 0) {
351            Py_DECREF(it);
352            Py_RETURN_TRUE; // Py_RETURN_FALSE;
353            // ^ THIS IS DIFFERENT NOW!
354            // Expectation: return `True` if 
355            // PyObject_IsTrue(item) evaluates to false
356        }
357    }
358    Py_DECREF(it);
359    if (PyErr_Occurred()) {
360        if (PyErr_ExceptionMatches(PyExc_StopIteration))
361            PyErr_Clear();
362        else
363            return NULL;
364    }
365    Py_RETURN_TRUE;
366}

(Sauce)

With this change, my expectation is that running the following:

all([0,0,0])

will return True instead of False as expected. And sure enough:

➜  cpython git:(master) ✗ docker run \
    -it \
    -v ${PWD}/Python:/usr/src/app/Python pydev:1.0 \
        bash -c "make clean && make -s -j2 && ./python"
# ... skipping output for the sake of brevity ... 

Python 3.10.0a6+ (heads/master-dirty:6086ae7, Mar 18 2021, 04:33:26) [GCC 4.9.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> all([0,0,0])
True
>>>

Tada! Progress!

Step 4: Implementing a custom builtin method

So finally, having proven to ourselves that we can:

– let’s make our actual change.

Again, I am no C expert so I chose to make the core implementation of my custom method, which I am calling builtin_samesame (dedicated to my former esteemed colleague and friend Omid who commonly used this refrain during our pair programming sessions at PX), as similar as possible to builtin_all.

Let’s start off by viewing the implementation:

 1static PyObject *
 2builtin_samesame(PyObject *module, PyObject *iterable)
 3/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
 4{
 5    PyObject *it, *item, *firstitem;
 6    PyObject *(*iternext)(PyObject *);
 7
 8    it = PyObject_GetIter(iterable);
 9    if (it == NULL)
10        return NULL;
11    iternext = *Py_TYPE(it)->tp_iternext;
12
13    firstitem = iternext(it);
14    if (firstitem == NULL) {
15        Py_DECREF(it);
16        Py_RETURN_TRUE;
17    }
18
19    for (;;) {
20        item = iternext(it);
21        if (item == NULL)
22            break;
23        if (Py_TYPE(firstitem)->tp_richcompare != NULL) {
24            PyObject *res;
25            res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);
26            if (res != Py_False) {
27                Py_DECREF(firstitem);
28                Py_DECREF(item);
29                Py_DECREF(res);
30                Py_DECREF(it);
31                Py_RETURN_FALSE;
32            }
33            Py_DECREF(res);
34        }
35        else if (item != firstitem) {
36            Py_DECREF(firstitem);
37            Py_DECREF(item);
38            Py_DECREF(it);
39            Py_RETURN_FALSE;
40        }
41    }
42    Py_DECREF(firstitem);
43    Py_DECREF(it);
44    if (PyErr_Occurred()) {
45        if (PyErr_ExceptionMatches(PyExc_StopIteration))
46            PyErr_Clear();
47        else
48            return NULL;
49    }
50    Py_RETURN_TRUE;
51}

This does look pretty hardcore but I swear it isn’t! The main differences between builtin_samesame and builtin_all are on lines 5, 13-17, 23-42. Let’s break each of these three regions down to better grok what is going on.

LINE 5

5PyObject *it, *item, *firstitem;

This one is simple - we just declare an additional PyObject type, *firstitem. The idea is if any of the remaining items in our list is NOT equal to *firstitem, we can return false immediately as we are done.

LINES 13-17

13firstitem = iternext(it);
14if (firstitem == NULL) {
15    Py_DECREF(it);
16    Py_RETURN_TRUE;
17}

Here, we grab the first item in our iterable. If it is NULL, then there is nothing to do. By definition (of our own choosing, ha) we return True.

The meaning of Py_DECREF is available here in the docs. Basically, we decrement the reference count for a not-NULL object. Once the reference count hits 0, the object is deallocated from memory (if NULL though, we end up seg faulting, ha). Because this check is explicitly for firstitem being NULL, there is no need to decrement the ref count for firstitem.

LINES 23-42

23// ...
24    if (Py_TYPE(firstitem)->tp_richcompare != NULL) {
25        PyObject *res;
26        res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);
27        if (res != Py_False) {
28            Py_DECREF(firstitem);
29            Py_DECREF(item);
30            Py_DECREF(res);
31            Py_DECREF(it);
32            Py_RETURN_FALSE;
33        }
34        Py_DECREF(res);
35    }
36    else if (item != firstitem) {
37        Py_DECREF(firstitem);
38        Py_DECREF(item);
39        Py_DECREF(it);
40        Py_RETURN_FALSE;
41    }
42}
43Py_DECREF(firstitem);

This logic performs the bulk of the work. (It could also definitely use some TLC but for now, I’ll comment specifically on how it works as is).

Let’s get that else if out of the way first, as it is simpler:

// ...
else if (item != firstitem) {
    Py_DECREF(firstitem);
    Py_DECREF(item);
    Py_DECREF(it);
    Py_RETURN_FALSE;
}

This performs the basic != check which ought to be well defined for primitive types. With this code block, cases such as:

all([0,0,0])
all([True, False, True])

are all covered.

The first half of that conditional block is more interesting:

if (Py_TYPE(firstitem)->tp_richcompare != NULL) {
    PyObject *res;
    res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);
    if (res != Py_False) {
        Py_DECREF(firstitem);
        Py_DECREF(item);
        Py_DECREF(res);
        Py_DECREF(it);
        Py_RETURN_FALSE;
    }
    Py_DECREF(res);
}

After looking through the handy Type Objects documentation, I was able to pinpoint tp_richcompare as the method that manages dunder methods such as __eq__, __ne__, etc. (This provides support for comparing non primitive types such as {} or set(), etc)

The docs also tells us what the function signature looks like for tp_richcompare:

PyObject *tp_richcompare(PyObject *self, PyObject *other, int op);

It also provides a handy table describing the constant definitions and their corresponding comparisons:

Constant Comparison
Py_LT <
Py_LE <=
Py_EQ ==
Py_NE !=
Py_GT >
Py_GE >=

Moreover, from looking at the Objects/dictobject.c implentation of tp_richcompare, it was easy to see:

2958static PyObject *
2959dict_richcompare(PyObject *v, PyObject *w, int op)
2960{
2961    int cmp;
2962    PyObject *res;
2963
2964    if (!PyDict_Check(v) || !PyDict_Check(w)) {
2965        res = Py_NotImplemented;
2966    }
2967    else if (op == Py_EQ || op == Py_NE) {
2968        cmp = dict_equal((PyDictObject *)v, (PyDictObject *)w);
2969        if (cmp < 0)
2970            return NULL;
2971        res = (cmp == (op == Py_EQ)) ? Py_True : Py_False;
2972    }
2973    else
2974        res = Py_NotImplemented;
2975    Py_INCREF(res);
2976    return res;
2977}

(Sauce)

that this method is static and essentially the same beast as tp_as_number (a topic of discussion from Part I ). As such, I drew on inspiration from the PyObject_IsTrue method we discussed in Part I(reproduced below for convenience) to figure out how to call tp_richcompare.

1379/* Test a value used as condition, e.g., in a while or if statement.
1380   Return -1 if an error occurred */
1381
1382int
1383PyObject_IsTrue(PyObject *v)
1384{
1385    Py_ssize_t res;
1386    if (v == Py_True)
1387        return 1;
1388    if (v == Py_False)
1389        return 0;
1390    if (v == Py_None)
1391        return 0;
1392    else if (Py_TYPE(v)->tp_as_number != NULL &&
1393             Py_TYPE(v)->tp_as_number->nb_bool != NULL)
1394        res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
1395    else if (Py_TYPE(v)->tp_as_mapping != NULL &&
1396             Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
1397        res = (*Py_TYPE(v)->tp_as_mapping->mp_length)(v);
1398    else if (Py_TYPE(v)->tp_as_sequence != NULL &&
1399             Py_TYPE(v)->tp_as_sequence->sq_length != NULL)
1400        res = (*Py_TYPE(v)->tp_as_sequence->sq_length)(v);
1401    else
1402        return 1;
1403    /* if it is negative, it should be either -1 or -2 */
1404    return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
1405}

(Sauce)

Basically, just emulate line 1394 but for tp_richcompare and pass in the required arguments as per the method signature we observed above.

Let’s reproduce our conditional block one more time for convenience:

if (Py_TYPE(firstitem)->tp_richcompare != NULL) {
    PyObject *res;
    res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);
    if (res != Py_False) {
        Py_DECREF(firstitem);
        Py_DECREF(item);
        Py_DECREF(res);
        Py_DECREF(it);
        Py_RETURN_FALSE;
    }
    Py_DECREF(res);
}

All this line is doing:

res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);

is invoking tp_richcompare on our firstitem (remember this is a static method so we pass in firstitem, item and the op constant, in this case Py_NE) and based on the return value we choose to either stop in our tracks or continue on with the loop.

(NOTE: the way I implemented this is confusing af. The reason why we check for res != Py_False is because we are running tp_richcompare with op = Py_NE. So, if firstitem != item is False that means the two items are equal and we can move on. For all other cases (if they are false OR if there is an error), we want to stop the loop. I know, I know, this is poorly written code - but it works and I’m feeling lazy 😎)

There’s a ton of Py_DECREFs all over the place mainly because I don’t know enough C to come up with a way to make that easier to manage. A problem for next time!

Ok so let’s put a bow on this – in conclusion, our builtin_samesame method (reproduced a final time below):

 1static PyObject *
 2builtin_samesame(PyObject *module, PyObject *iterable)
 3/*[clinic end generated code: output=ca2a7127276f79b3 input=1a7c5d1bc3438a21]*/
 4{
 5    PyObject *it, *item, *firstitem;
 6    PyObject *(*iternext)(PyObject *);
 7
 8    it = PyObject_GetIter(iterable);
 9    if (it == NULL)
10        return NULL;
11    iternext = *Py_TYPE(it)->tp_iternext;
12
13    firstitem = iternext(it);
14    if (firstitem == NULL) {
15        Py_DECREF(it);
16        Py_RETURN_TRUE;
17    }
18
19    for (;;) {
20        item = iternext(it);
21        if (item == NULL)
22            break;
23        if (Py_TYPE(firstitem)->tp_richcompare != NULL) {
24            PyObject *res;
25            res = (*Py_TYPE(firstitem)->tp_richcompare)(firstitem, item, Py_NE);
26            if (res != Py_False) {
27                Py_DECREF(firstitem);
28                Py_DECREF(item);
29                Py_DECREF(res);
30                Py_DECREF(it);
31                Py_RETURN_FALSE;
32            }
33            Py_DECREF(res);
34        }
35        else if (item != firstitem) {
36            Py_DECREF(firstitem);
37            Py_DECREF(item);
38            Py_DECREF(it);
39            Py_RETURN_FALSE;
40        }
41    }
42    Py_DECREF(firstitem);
43    Py_DECREF(it);
44    if (PyErr_Occurred()) {
45        if (PyErr_ExceptionMatches(PyExc_StopIteration))
46            PyErr_Clear();
47        else
48            return NULL;
49    }
50    Py_RETURN_TRUE;
51}

Ought to provide support for our original impression of what all(..) does - that is, return True if all items in the iterable passed in are the same.

To actually test this guy out though, we have a few more changes left to make.

First, in bltinmodule.c’s header file (./Python/clinic/bltinmodule.c.h), we add the following definition:

PyDoc_STRVAR(builtin_samesame__doc__,
"samesame($module, iterable, /)\n"
"--\n"
"\n"
"Return True if all values x in the iterable are equal.\n"
"\n"
"If the iterable is empty, return True.");

#define BUILTIN_SAMESAME_METHODDEF    \
    {"samesame", (PyCFunction)builtin_samesame, METH_O, builtin_samesame__doc__},

This is crucial as it associates the python method name “samesame” with the builtin_samesame function. Also, we have the ability to provide some documentation as well which is manifested using: help(samesame):

>>> help(samesame)
Help on built-in function samesame in module builtins:

samesame(iterable, /)
    Return True if all values x in the iterable are equal.

    If the iterable is empty, return True.

>>>

Meanwhile, back at the ranch (in the bltinmodule.c file), we add our final change (note line 2893):

2887static PyMethodDef builtin_methods[] = {
2888    {"__build_class__", (PyCFunction)(void(*)(void))builtin___build_class__,
2889     METH_FASTCALL | METH_KEYWORDS, build_class_doc},
2890    {"__import__",      (PyCFunction)(void(*)(void))builtin___import__, METH_VARARGS | METH_KEYWORDS, import_doc},
2891    BUILTIN_ABS_METHODDEF
2892    BUILTIN_ALL_METHODDEF
2893    BUILTIN_SAMESAME_METHODDEF
2894    BUILTIN_ANY_METHODDEF
2895    BUILTIN_ASCII_METHODDEF
2896    BUILTIN_BIN_METHODDEF
2897    {"breakpoint",      (PyCFunction)(void(*)(void))builtin_breakpoint, METH_FASTCALL | METH_KEYWORDS, breakpoint_doc},
2898    BUILTIN_CALLABLE_METHODDEF
2899    BUILTIN_CHR_METHODDEF
2900    BUILTIN_COMPILE_METHODDEF
2901    BUILTIN_DELATTR_METHODDEF
2902    {"dir",             builtin_dir,        METH_VARARGS, dir_doc},
2903    BUILTIN_DIVMOD_METHODDEF
2904    BUILTIN_EVAL_METHODDEF
2905    BUILTIN_EXEC_METHODDEF
2906    BUILTIN_FORMAT_METHODDEF
2907    {"getattr",         (PyCFunction)(void(*)(void))builtin_getattr, METH_FASTCALL, getattr_doc},
2908    BUILTIN_GLOBALS_METHODDEF
2909    BUILTIN_HASATTR_METHODDEF
2910    BUILTIN_HASH_METHODDEF
2911    BUILTIN_HEX_METHODDEF
2912    BUILTIN_ID_METHODDEF
2913    BUILTIN_INPUT_METHODDEF
2914    BUILTIN_ISINSTANCE_METHODDEF
2915    BUILTIN_ISSUBCLASS_METHODDEF
2916    {"iter",            (PyCFunction)(void(*)(void))builtin_iter,       METH_FASTCALL, iter_doc},
2917    BUILTIN_LEN_METHODDEF
2918    BUILTIN_LOCALS_METHODDEF
2919    {"max",             (PyCFunction)(void(*)(void))builtin_max,        METH_VARARGS | METH_KEYWORDS, max_doc},
2920    {"min",             (PyCFunction)(void(*)(void))builtin_min,        METH_VARARGS | METH_KEYWORDS, min_doc},
2921    {"next",            (PyCFunction)(void(*)(void))builtin_next,       METH_FASTCALL, next_doc},
2922    BUILTIN_OCT_METHODDEF
2923    BUILTIN_ORD_METHODDEF
2924    BUILTIN_POW_METHODDEF
2925    {"print",           (PyCFunction)(void(*)(void))builtin_print,      METH_FASTCALL | METH_KEYWORDS, print_doc},
2926    BUILTIN_REPR_METHODDEF
2927    BUILTIN_ROUND_METHODDEF
2928    BUILTIN_SETATTR_METHODDEF
2929    BUILTIN_SORTED_METHODDEF
2930    BUILTIN_SUM_METHODDEF
2931    {"vars",            builtin_vars,       METH_VARARGS, vars_doc},
2932    {NULL,              NULL},
2933};

And with this, we are in business! Re-run docker and check it:

>>> samesame([0,0,0])
True
>>> samesame([1,0,0])
False
>>> samesame([0,1,0])
False
>>> samesame([0,0.0,0])
False
>>> samesame([])
True
>>> samesame([{},{},{}])
True
>>> samesame([{},{},set()])
False
>>>

Tada!

Final Remarks

Ok cool - so now, there’s a path forward for developing with changes to the Python lang itself and then testing the changes for correctness and viability.

I should mention that I am not suggesting that samesame ought to be added to the Python API. This was more of an exercise of determining how might we do such a thing if we wanted to or needed to.

I’m particularly excited about the Dockerfile and would like to follow up…at some point, with a more mature version of it. While it is super tempting to go further messing around with python internals I’ll probably leave things here for now at least. At some point (per suggestion from another esteemed colleague), I’d certainly like to take a stab at creating a custom python type (this is more aligned with the content from Part I) just to see what that’s like.

Please find MR of the changes discussed here

Share