TAGS: python just for fun

Hacking Python With Spells From Harry Potter

hp py

This post is intended to be a follow-up/continuation of my discussion around cpython and grokking (generally speaking) how the language works “under the hood”.

(For better context, I’d recommend first reading the previous post(s) in this series:

  1. “Grokking Builtin.all in Python”
  2. “Extending Python’s Builtin C Modules”

(That being said, this post is relatively stand alone and you can just probably make do ok reading this without looking at the previous parts).

Why tho?

In this post, I will describe how I extended cpython once again, this time aliasing three python syntax elements to three Harry Potter spells:

Years ago - around 2014 probably - my buddy and (current) rustacean Apoorv Kothari mentioned a meet up he went to where someone decided to swap out python’s import keyword with accio.

There was no real rhyme or reason for doing this - but, it did lead to a deep dive of python internals and made for a fun story to tell.

Given my recent exploration of cpython, I thought it would be fun to see how far I could take this concept myself. My goal was to support accio as an alias for import but along the way (as I learned more about how parsing works in cpython), I thought to also alias lumos(...) for type(...) (a quick win) and avadakedavra for del (pretty easy once I figured out accio).

I hope to follow up with one last post…at some point where I will also add a custom operator (perhaps ! for not or ~= for regex (thanks to Phil Eaton for the idea, though he may get to this before me)) and define a custom type (like float or something). In my follow up I hope to dive deeper into the specifics of the new PEG parser for CPython (PEP617).

For the purposes of this post however, I’d like to primarily share the steps required to achieve the three aliases in question (without going too deeply into how the parser itself works). Let’s start with lumos, by far the simplest of the three.

type -> lumos

To implement lumos, I started at ./Python/clinic/bltinmodule.c.h (sauce). (Recall from the previous post that this is the file where we ultimately defined samesame, a custom function added to python’s list of builtins).

I was hoping to find something similar to:

PyDoc_STRVAR(builtin_any__doc__,
"any($module, iterable, /)\n"
"--\n"
"\n"
"Return True if bool(x) is True for any x in the iterable.\n"
"\n"
"If the iterable is empty, return False.");

#define BUILTIN_ANY_METHODDEF    \
    {"any", (PyCFunction)builtin_any, METH_O, builtin_any__doc__},

Which would help me pinpoint the definition of the actual type(...) implementation. Then, I was going to just add another line similar to:

#define BUILTIN_ANY_METHODDEF    \
    {"any", (PyCFunction)builtin_any, METH_O, builtin_any__doc__},

for my alias.

I couldn’t find anything in that header file though, so I fell back to my tried and true grep the entire codebase for type approach. This also sucked! (Way too much info). But, I refined my search and grepped for:

grep -rI "\"type\"" ./Python

which was super helpful as it pinpointed the line where “type” is associated to the underlying c-method. What I was searching for was infact in ./Python/bltinmodule.c (line 2951, reproducing entire block below for convenience):

2917#define SETBUILTIN(NAME, OBJECT) \
2918    if (PyDict_SetItemString(dict, NAME, (PyObject *)OBJECT) < 0)       \
2919        return NULL;                                                    \
2920    ADD_TO_ALL(OBJECT)
2921
2922    SETBUILTIN("None",                  Py_None);
2923    SETBUILTIN("Ellipsis",              Py_Ellipsis);
2924    SETBUILTIN("NotImplemented",        Py_NotImplemented);
2925    SETBUILTIN("False",                 Py_False);
2926    SETBUILTIN("True",                  Py_True);
2927    SETBUILTIN("bool",                  &PyBool_Type);
2928    SETBUILTIN("memoryview",        &PyMemoryView_Type);
2929    SETBUILTIN("bytearray",             &PyByteArray_Type);
2930    SETBUILTIN("bytes",                 &PyBytes_Type);
2931    SETBUILTIN("classmethod",           &PyClassMethod_Type);
2932    SETBUILTIN("complex",               &PyComplex_Type);
2933    SETBUILTIN("dict",                  &PyDict_Type);
2934    SETBUILTIN("enumerate",             &PyEnum_Type);
2935    SETBUILTIN("filter",                &PyFilter_Type);
2936    SETBUILTIN("float",                 &PyFloat_Type);
2937    SETBUILTIN("frozenset",             &PyFrozenSet_Type);
2938    SETBUILTIN("property",              &PyProperty_Type);
2939    SETBUILTIN("int",                   &PyLong_Type);
2940    SETBUILTIN("list",                  &PyList_Type);
2941    SETBUILTIN("map",                   &PyMap_Type);
2942    SETBUILTIN("object",                &PyBaseObject_Type);
2943    SETBUILTIN("range",                 &PyRange_Type);
2944    SETBUILTIN("reversed",              &PyReversed_Type);
2945    SETBUILTIN("set",                   &PySet_Type);
2946    SETBUILTIN("slice",                 &PySlice_Type);
2947    SETBUILTIN("staticmethod",          &PyStaticMethod_Type);
2948    SETBUILTIN("str",                   &PyUnicode_Type);
2949    SETBUILTIN("super",                 &PySuper_Type);
2950    SETBUILTIN("tuple",                 &PyTuple_Type);
2951    SETBUILTIN("type",                  &PyType_Type);
2952    SETBUILTIN("zip",                   &PyZip_Type);
2953    debug = PyBool_FromLong(config->optimization_level == 0);
2954    if (PyDict_SetItemString(dict, "__debug__", debug) < 0) {
2955        Py_DECREF(debug);
2956        return NULL;
2957    }
2958    Py_DECREF(debug);
2959
2960    return mod;
2961#undef ADD_TO_ALL
2962#undef SETBUILTIN

Interestingly, type is defined as PyType_Type(a PyObject, similar to floats, etc - sauce) and not as a builtin__type as I would have expected (similar to all(..) for instance).

Regardless - having tracked down the location of this definition, it was very easy to “implement” support for lumos, just add the following:

2950// ... 
2951SETBUILTIN("type",                  &PyType_Type);
2952SETBUILTIN("lumos",                  &PyType_Type);

rebuild + run:

docker build -t pydev:1.0 .
docker run -it pydev:1.0 ./python

(Check out “Step 2: Dockerizing the build process” from the last post in this series for more info.)

and then test:

>>> lumos(15)
<class 'int'>

Tada!

(Here’s the PR of these changes.)

del -> avadakedavra

Next up we have the gruesome avadakedavra implementation. This one is interesting in that to achieve this change, we must actually modify the grammar that defines python.

I went down the wrong path hard by applying my previous strategy of grepping the entire project. There is actually a LOT of places where the token del is used but as it turns out - most of the code is generated!

I eventually ended up stumbling back on to the Python Developer Guide, which has a handy Changing Cpython’s Grammar post delineating how to properly make changes to the grammar definitions.

THe Changing Cpython’s Grammar post suggests starting from the ./Grammar/python.gram file; peering into that file we note that del is defined on line 73:

66simple_stmt[stmt_ty] (memo):
67    | assignment
68    | e=star_expressions { _Py_Expr(e, EXTRA) }
69    | &'return' return_stmt
70    | &('import' | 'from') import_stmt
71    | &'raise' raise_stmt
72    | 'pass' { _Py_Pass(EXTRA) }
73    | &'del' del_stmt
74    | &'yield' yield_stmt
75    | &'assert' assert_stmt
76    | 'break' { _Py_Break(EXTRA) }
77    | 'continue' { _Py_Continue(EXTRA) }
78    | &'global' global_stmt
79    | &'nonlocal' nonlocal_stmt

(Sauce)

Additionally, the del_stmt is defined further down in the same file:

132del_stmt[stmt_ty]:
133    | 'del' a=del_targets &(';' | NEWLINE) { _Py_Delete(a, EXTRA) }
134    | invalid_del_stmt

(Sauce)

And finally, invalid_del_stmt is defined again further down in the same file:

797invalid_del_stmt:
798    | 'del' a=star_expressions {
799        RAISE_SYNTAX_ERROR_INVALID_TARGET(DEL_TARGETS, a) }

(Sauce)

Since we only want to alias avadakedavra, we can take a shortcut and only add stmt definitions that actuall refer to the del token itself.

As such, we must add an avadakedavra_stmt and invalid_avadakedavra_stmt since both blocks refer directly to the del keyword. However, we leave del_targets (line 133) alone since we require avadakedavra to work exactly in the same way as del works.

Adding the updates (summarized below), we end up with:

simple_stmt[stmt_ty] (memo):
    # ... other statements
    | &'del' del_stmt
    | &'avadakedavra' avadakedavra_stmt
    # ... other statements

# ... more definitions

avadakedavra_stmt[stmt_ty]:
    | 'avadakedavra' a=del_targets &(';' | NEWLINE) { _Py_Delete(a, EXTRA) }
    | invalid_avadakedavra_stmt

# ... more definitions

invalid_avadakedavra_stmt:
    | 'avadakedavra' a=star_expressions {
        RAISE_SYNTAX_ERROR_INVALID_TARGET(DEL_TARGETS, a) }

(Sauce)

According to the docs, we must now run make regen-pegen, which will regenerate ./Parser/parser.c and allow our dev-python build to recognize the new token we just defined.

In order to run that make target, we actually need python3.9 installed. The docker image we have defined actually does not have python installed at all. Instead of modifying it, I just ran the make command within the python:3.9 base image, as follows:

docker run -it -v ${PWD}:/app -w /app python:3.9 make regen-pegen

Here, I volume mount the source into the container running our python:3.9 image and run the make target. This allows the regenerated parser.c file to be available to our host machine - meaning rebuilding our original pydev image will generate the binary with the grammar changes we defined.

So, all together these three lines should do it (note the comments for additional clarification):

# make regen-pegen to generate parser.c
docker run -it -v ${PWD}:/app -w /app python:3.9 make regen-pegen
# rebuild image with the new parser.c
docker build -t pydev:1.0 .
# run our dev build with support for avadakedavra!
docker run -it pydev:1.0 ./python

With all said and done, we now observe that we can use avadakedavra in place of del!!

>>> unfortunate_characters = {"Hedwig": True, "Mad-Eye Moody": True,}
>>> avadakedavra unfortunate_characters["Hedwig"]
>>> unfortunate_characters
{'Mad-Eye Moody': True}
>>> avadakedavra unfortunate_characters
>>> unfortunate_characters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'unfortunate_characters' is not defined
>>>

Tada!

(Here’s the PR of these changes.)

import -> accio

Last but not least - let’s repeat our process to define accio, which behaves exactly like import. At first glance, this feels more complex since import can be used in a multitude of ways:

import random
from random import randint, randrange
from math import *

Still, I decided to start the simplest way possible: by repeating the steps from the previous section (del -> avadakedavra). I figured if it failed, the error messages might provide meaningful hints as to how to proceed further along.

So, again in ./Grammar/python.gram, we make the following changes:

simple_stmt[stmt_ty] (memo):
    # ... other statements
    | &('import' | 'from') import_stmt
    | &('accio' | 'from') accio_stmt
    # ... other statements

# ... more definitions

accio_stmt[stmt_ty]: accio_name | accio_from
accio_name[stmt_ty]: 'accio' a=dotted_as_names { _Py_Import(a, EXTRA) }
# note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS
accio_from[stmt_ty]:
    | 'from' a=('.' | '...')* b=dotted_name 'accio' c=import_from_targets {
        _Py_ImportFrom(b->v.Name.id, c, _PyPegen_seq_count_dots(a), EXTRA) }
    | 'from' a=('.' | '...')+ 'accio' b=import_from_targets {
        _Py_ImportFrom(NULL, b, _PyPegen_seq_count_dots(a), EXTRA) }

# ... more definitions

(Sauce)

We only make necessary changes which in this case translates to updating the accio_name and accio_from stmts. Once completed, we re-run the regen-pegen make target and build, resulting in:

>>> accio random
>>> random.randint(1,5)
4
>>> from random accio randint, randrange
>>> randint(1,5)
2
>>> randrange(5)
4
>>> from math accio *
>>> ceil(4.2)
5
>>>

Tada!

(Here’s the PR of these changes.)

Final Remarks

While this was certainly a magical experience, the truth of the matter is until recently the inner workings of python - to me at least - certainly were magical. I could tell you very accurately what outputs to expect based on inputs provided but could not tell you for the life of me how these outputs were actually computed. (This is essentially the definition of the word magic, right?)

Having gone deeper into the internals of cpython at least, I now feel like I have a better sense of the how, which has been very fascinating to discover. There’s definitely still much left to learn / grok and I’m unsure just how much deeper I’ll take this. But, at the very least I hope my explorations here (especially the docker stuff!) can help make it easier for you, dear reader, to conduct similar exercises of your own for your own understanding and benefit.

Happy hacking, fam.

Update 03/21/2021

After connecting with my buddy Steven Li, I have made a few other additions. Generally speaking they have been quite boring - some version of what I have demonstrated above. We are hoping to port over more of the existing grammer and just maintain this as a separate project mainly for fun.

Here are the pull requests so far.

In addition to what we have above, the following alias are now available:

Share