Hacking Python With Spells From Harry Potter
This post is intended to be a follow-up/continuation of my discussion around cpython and grokking (generally speaking) how the language works “under the hood”.
(For better context, I’d recommend first reading the previous post(s) in this series:
(That being said, this post is relatively stand alone and you can just probably make do ok reading this without looking at the previous parts).
Why tho?
In this post, I will describe how I extended cpython once again, this time aliasing three python syntax elements to three Harry Potter spells:
- lumos (aliasing type())
- accio (aliasing import)
- avadakedavra (aliasing del)
Years ago - around 2014 probably - my buddy and (current) rustacean Apoorv Kothari mentioned a meet up he went to where someone decided to swap out python’s import
keyword with accio
.
There was no real rhyme or reason for doing this - but, it did lead to a deep dive of python internals and made for a fun story to tell.
Given my recent exploration of cpython, I thought it would be fun to see how far I could take this concept myself. My goal was to support accio
as an alias for import
but along the way (as I learned more about how parsing works in cpython), I thought to also alias lumos(...)
for type(...)
(a quick win) and avadakedavra
for del
(pretty easy once I figured out accio
).
I hope to follow up with one last post…at some point where I will also add a custom operator (perhaps !
for not
or ~=
for regex (thanks to Phil Eaton for the idea, though he may get to this before me)) and define a custom type (like float
or something). In my follow up I hope to dive deeper into the specifics of the new PEG parser for CPython (PEP617).
For the purposes of this post however, I’d like to primarily share the steps required to achieve the three aliases in question (without going too deeply into how the parser itself works). Let’s start with lumos
, by far the simplest of the three.
type -> lumos
To implement lumos
, I started at ./Python/clinic/bltinmodule.c.h
(sauce). (Recall from the previous post that this is the file where we ultimately defined samesame
, a custom function added to python’s list of builtins).
I was hoping to find something similar to:
PyDoc_STRVAR(builtin_any__doc__,
"any($module, iterable, /)\n"
"--\n"
"\n"
"Return True if bool(x) is True for any x in the iterable.\n"
"\n"
"If the iterable is empty, return False.");
#define BUILTIN_ANY_METHODDEF \
{"any", (PyCFunction)builtin_any, METH_O, builtin_any__doc__},
Which would help me pinpoint the definition of the actual type(...)
implementation. Then, I was going to just add another line similar to:
#define BUILTIN_ANY_METHODDEF \
{"any", (PyCFunction)builtin_any, METH_O, builtin_any__doc__},
for my alias.
I couldn’t find anything in that header file though, so I fell back to my tried and true grep the entire codebase for type approach. This also sucked! (Way too much info). But, I refined my search and grepped for:
grep -rI "\"type\"" ./Python
which was super helpful as it pinpointed the line where “type” is associated to the underlying c-method. What I was searching for was infact in ./Python/bltinmodule.c
(line 2951, reproducing entire block below for convenience):
2917#define SETBUILTIN(NAME, OBJECT) \
2918 if (PyDict_SetItemString(dict, NAME, (PyObject *)OBJECT) < 0) \
2919 return NULL; \
2920 ADD_TO_ALL(OBJECT)
2921
2922 SETBUILTIN("None", Py_None);
2923 SETBUILTIN("Ellipsis", Py_Ellipsis);
2924 SETBUILTIN("NotImplemented", Py_NotImplemented);
2925 SETBUILTIN("False", Py_False);
2926 SETBUILTIN("True", Py_True);
2927 SETBUILTIN("bool", &PyBool_Type);
2928 SETBUILTIN("memoryview", &PyMemoryView_Type);
2929 SETBUILTIN("bytearray", &PyByteArray_Type);
2930 SETBUILTIN("bytes", &PyBytes_Type);
2931 SETBUILTIN("classmethod", &PyClassMethod_Type);
2932 SETBUILTIN("complex", &PyComplex_Type);
2933 SETBUILTIN("dict", &PyDict_Type);
2934 SETBUILTIN("enumerate", &PyEnum_Type);
2935 SETBUILTIN("filter", &PyFilter_Type);
2936 SETBUILTIN("float", &PyFloat_Type);
2937 SETBUILTIN("frozenset", &PyFrozenSet_Type);
2938 SETBUILTIN("property", &PyProperty_Type);
2939 SETBUILTIN("int", &PyLong_Type);
2940 SETBUILTIN("list", &PyList_Type);
2941 SETBUILTIN("map", &PyMap_Type);
2942 SETBUILTIN("object", &PyBaseObject_Type);
2943 SETBUILTIN("range", &PyRange_Type);
2944 SETBUILTIN("reversed", &PyReversed_Type);
2945 SETBUILTIN("set", &PySet_Type);
2946 SETBUILTIN("slice", &PySlice_Type);
2947 SETBUILTIN("staticmethod", &PyStaticMethod_Type);
2948 SETBUILTIN("str", &PyUnicode_Type);
2949 SETBUILTIN("super", &PySuper_Type);
2950 SETBUILTIN("tuple", &PyTuple_Type);
2951 SETBUILTIN("type", &PyType_Type);
2952 SETBUILTIN("zip", &PyZip_Type);
2953 debug = PyBool_FromLong(config->optimization_level == 0);
2954 if (PyDict_SetItemString(dict, "__debug__", debug) < 0) {
2955 Py_DECREF(debug);
2956 return NULL;
2957 }
2958 Py_DECREF(debug);
2959
2960 return mod;
2961#undef ADD_TO_ALL
2962#undef SETBUILTIN
Interestingly, type
is defined as PyType_Type
(a PyObject
, similar to floats, etc - sauce) and not as a builtin__type
as I would have expected (similar to all(..)
for instance).
Regardless - having tracked down the location of this definition, it was very easy to “implement” support for lumos
, just add the following:
2950// ...
2951SETBUILTIN("type", &PyType_Type);
2952SETBUILTIN("lumos", &PyType_Type);
rebuild + run:
docker build -t pydev:1.0 .
docker run -it pydev:1.0 ./python
(Check out “Step 2: Dockerizing the build process” from the last post in this series for more info.)
and then test:
>>> lumos(15)
<class 'int'>
Tada!
(Here’s the PR of these changes.)
del -> avadakedavra
Next up we have the gruesome avadakedavra
implementation. This one is interesting in that to achieve this change, we must actually modify the grammar that defines python.
I went down the wrong path hard by applying my previous strategy of grepping the entire project. There is actually a LOT of places where the token del is used but as it turns out - most of the code is generated!
I eventually ended up stumbling back on to the Python Developer Guide, which has a handy Changing Cpython’s Grammar post delineating how to properly make changes to the grammar definitions.
THe Changing Cpython’s Grammar post suggests starting from the ./Grammar/python.gram
file; peering into that file we note that del
is defined on line 73:
66simple_stmt[stmt_ty] (memo):
67 | assignment
68 | e=star_expressions { _Py_Expr(e, EXTRA) }
69 | &'return' return_stmt
70 | &('import' | 'from') import_stmt
71 | &'raise' raise_stmt
72 | 'pass' { _Py_Pass(EXTRA) }
73 | &'del' del_stmt
74 | &'yield' yield_stmt
75 | &'assert' assert_stmt
76 | 'break' { _Py_Break(EXTRA) }
77 | 'continue' { _Py_Continue(EXTRA) }
78 | &'global' global_stmt
79 | &'nonlocal' nonlocal_stmt
(Sauce)
Additionally, the del_stmt
is defined further down in the same file:
132del_stmt[stmt_ty]:
133 | 'del' a=del_targets &(';' | NEWLINE) { _Py_Delete(a, EXTRA) }
134 | invalid_del_stmt
(Sauce)
And finally, invalid_del_stmt
is defined again further down in the same file:
797invalid_del_stmt:
798 | 'del' a=star_expressions {
799 RAISE_SYNTAX_ERROR_INVALID_TARGET(DEL_TARGETS, a) }
(Sauce)
Since we only want to alias avadakedavra
, we can take a shortcut and only add stmt
definitions that actuall refer to the del token itself.
As such, we must add an avadakedavra_stmt
and invalid_avadakedavra_stmt
since both blocks refer directly to the del keyword. However, we leave del_targets
(line 133) alone since we require avadakedavra
to work exactly in the same way as del
works.
Adding the updates (summarized below), we end up with:
simple_stmt[stmt_ty] (memo):
# ... other statements
| &'del' del_stmt
| &'avadakedavra' avadakedavra_stmt
# ... other statements
# ... more definitions
avadakedavra_stmt[stmt_ty]:
| 'avadakedavra' a=del_targets &(';' | NEWLINE) { _Py_Delete(a, EXTRA) }
| invalid_avadakedavra_stmt
# ... more definitions
invalid_avadakedavra_stmt:
| 'avadakedavra' a=star_expressions {
RAISE_SYNTAX_ERROR_INVALID_TARGET(DEL_TARGETS, a) }
(Sauce)
According to the docs, we must now run make regen-pegen
, which will regenerate ./Parser/parser.c
and allow our dev-python build to recognize the new token we just defined.
In order to run that make target, we actually need python3.9 installed. The docker image we have defined actually does not have python installed at all. Instead of modifying it, I just ran the make command within the python:3.9
base image, as follows:
docker run -it -v ${PWD}:/app -w /app python:3.9 make regen-pegen
Here, I volume mount the source into the container running our python:3.9
image and run the make target. This allows the regenerated parser.c file to be available to our host machine - meaning rebuilding our original pydev
image will generate the binary with the grammar changes we defined.
So, all together these three lines should do it (note the comments for additional clarification):
# make regen-pegen to generate parser.c
docker run -it -v ${PWD}:/app -w /app python:3.9 make regen-pegen
# rebuild image with the new parser.c
docker build -t pydev:1.0 .
# run our dev build with support for avadakedavra!
docker run -it pydev:1.0 ./python
With all said and done, we now observe that we can use avadakedavra
in place of del
!!
>>> unfortunate_characters = {"Hedwig": True, "Mad-Eye Moody": True,}
>>> avadakedavra unfortunate_characters["Hedwig"]
>>> unfortunate_characters
{'Mad-Eye Moody': True}
>>> avadakedavra unfortunate_characters
>>> unfortunate_characters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'unfortunate_characters' is not defined
>>>
Tada!
(Here’s the PR of these changes.)
import -> accio
Last but not least - let’s repeat our process to define accio
, which behaves exactly like import
. At first glance, this feels more complex since import can be used in a multitude of ways:
import random
from random import randint, randrange
from math import *
Still, I decided to start the simplest way possible: by repeating the steps from the previous section (del -> avadakedavra). I figured if it failed, the error messages might provide meaningful hints as to how to proceed further along.
So, again in ./Grammar/python.gram
, we make the following changes:
simple_stmt[stmt_ty] (memo):
# ... other statements
| &('import' | 'from') import_stmt
| &('accio' | 'from') accio_stmt
# ... other statements
# ... more definitions
accio_stmt[stmt_ty]: accio_name | accio_from
accio_name[stmt_ty]: 'accio' a=dotted_as_names { _Py_Import(a, EXTRA) }
# note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS
accio_from[stmt_ty]:
| 'from' a=('.' | '...')* b=dotted_name 'accio' c=import_from_targets {
_Py_ImportFrom(b->v.Name.id, c, _PyPegen_seq_count_dots(a), EXTRA) }
| 'from' a=('.' | '...')+ 'accio' b=import_from_targets {
_Py_ImportFrom(NULL, b, _PyPegen_seq_count_dots(a), EXTRA) }
# ... more definitions
(Sauce)
We only make necessary changes which in this case translates to updating the accio_name
and accio_from
stmts. Once completed, we re-run the regen-pegen
make target and build, resulting in:
>>> accio random
>>> random.randint(1,5)
4
>>> from random accio randint, randrange
>>> randint(1,5)
2
>>> randrange(5)
4
>>> from math accio *
>>> ceil(4.2)
5
>>>
Tada!
(Here’s the PR of these changes.)
Final Remarks
While this was certainly a magical experience, the truth of the matter is until recently the inner workings of python - to me at least - certainly were magical. I could tell you very accurately what outputs to expect based on inputs provided but could not tell you for the life of me how these outputs were actually computed. (This is essentially the definition of the word magic, right?)
Having gone deeper into the internals of cpython at least, I now feel like I have a better sense of the how, which has been very fascinating to discover. There’s definitely still much left to learn / grok and I’m unsure just how much deeper I’ll take this. But, at the very least I hope my explorations here (especially the docker stuff!) can help make it easier for you, dear reader, to conduct similar exercises of your own for your own understanding and benefit.
Happy hacking, fam.
Update 03/21/2021
After connecting with my buddy Steven Li, I have made a few other additions. Generally speaking they have been quite boring - some version of what I have demonstrated above. We are hoping to port over more of the existing grammer and just maintain this as a separate project mainly for fun.
Here are the pull requests so far.
In addition to what we have above, the following alias are now available: