Navigation
- README
- Pure-Python feature set
- Syntactic macro feature set
- Examples of creating dialects using
mcpyrate
- REPL server
- Troubleshooting
- Design notes
- Essays
- Additional reading
- Contribution guidelines
Our extensions to the Python language are built on mcpyrate
, from the PyPI package mcpyrate
.
Because in Python macro expansion occurs at import time, Python programs whose main module uses macros, such as our unit tests that contain usage examples, cannot be run directly by python3
. Instead, run them via the macropython
bootstrapper, included in mcpyrate
.
Our macros expect a from-import style for detecting uses of unpythonic
constructs, even when those constructs are regular functions. For example, the function curry
is detected from its bare name. So if you intend to use these macros, then, for regular imports from unpythonic
, use from unpythonic import ...
and avoid renaming (as
).
This document doubles as the API reference, but despite maintenance on a best-effort basis, may occasionally be out of date at places. In case of conflicts in documentation, believe the unit tests first; specifically the code, not necessarily the comments. Everything else (comments, docstrings and this guide) should agree with the unit tests. So if something fails to work as advertised, check what the tests do - and optionally file an issue on GitHub so that the documentation can be fixed.
Changed in v0.15.0. To run macro-enabled programs, use the macropython
bootstrapper from mcpyrate
.
This document is up-to-date for v0.15.0.
let
,letseq
,letrec
as macros; proper lexical scoping, no boilerplate.dlet
,dletseq
,dletrec
,blet
,bletseq
,bletrec
: decorator versions- Caution on name resolution and scoping
let_syntax
,abbrev
: syntactic local bindings; splice code at macro expansion time.- Bonus: barebones
let
: pure AST transformation oflet
into alambda
.
multilambda
: supercharge your lambdas; multiple expressions, local variables.namedlambda
: auto-name your lambdas by assignment.fn
: underscore notation (quick lambdas) for Pythonquicklambda
: expand quick lambdas firstenvify
: make formal parameters live in an unpythonicenv
autocurry
: automatic currying for Pythonlazify
: call-by-need for Pythontco
: automatic tail call optimization for Pythoncontinuations
: call/cc for Pythonprefix
: prefix function call syntax for Pythonautoreturn
: implicitreturn
in tail position, like in Lisps.forall
: nondeterministic evaluation with monadic do-notation for Python.
cond
: the missingelif
fora if p else b
aif
: anaphoric if, the test result isit
.autoref
: implicitly reference attributes of an object
unpythonic.test.fixtures
: a test framework for macro-enabled Python- Overview
- Testing syntax quick reference
- Expansion order
with test
: test blocksthe
: capture the value of interesting subexpressions- Test sessions and testsets
- Producing unconditional failures, errors, and warnings
- Advanced: building a custom test framework
- Why another test framework?
- Etymology and roots
dbg
: debug-print expressions with source code
- The xmas tree combo: notes on the macros working together.
- Emacs syntax highlighting for
unpythonic.syntax
andmcpyrate
. - This is semantics, not syntax!
Macros that introduce new ways to bind identifiers.
Changed in v0.15.3. Added support for the walrus operator :=
for env-assignment. This is the new preferred syntax to establish let-bindings. All old syntaxes are still supported for backward compatibility.
Changed in v0.15.0. Added support for env-assignment syntax in the bindings subform. For consistency with other env-assignments, this is now the preferred syntax to establish let-bindings. Additionally, the old lispy syntax now accepts also brackets, for consistency with the use of brackets for macro invocations.
These macros provide properly lexically scoped let
constructs, no boilerplate:
from unpythonic.syntax import macros, let, letseq, letrec
let[x := 17, # parallel binding, i.e. bindings don't see each other
y := 23][
print(x, y)]
letseq[x := 1, # sequential binding, i.e. Scheme/Racket let*
y := x + 1][
print(x, y)]
letrec[evenp := (lambda x: (x == 0) or oddp(x - 1)), # mutually recursive binding, sequentially evaluated
oddp := (lambda x: (x != 0) and evenp(x - 1))][
print(evenp(42))]
Even with just one binding, the syntax remains the same:
let[x := 21][2 * x]
There must be at least one binding; let[][...]
is a syntax error, since Python's parser rejects an empty subscript slice.
Bindings are established using standard assignment expression syntax, name := value
. The let-bindings can be rebound in the body with the same syntax, e.g. x := 42
.
The old unpythonic
env-assignment syntax, name << value
, is also supported for backward compatibility. This was the preferred syntax in v0.15.0 to v0.15.2.
NOTE: All let-bindings must be established in the bindings subform. If you absolutely need to do establish more bindings in the body, see the sequencing construct do[]
and its syntax local[x := 42]
.
NOTE: Language support for using an assignment expression inside a subscript without parenthesizing it was added in Python 3.10. The syntax accepted when running on Python 3.8 or 3.9 is:
let[(x := 17),
(y := 23)][
print(x, y)]
That is, Python 3.8 and 3.9 require parentheses around each let binding if you use the new :=
syntax, because syntactically, the bindings subform looks like a subscript. The unit tests use this syntax so that they work on 3.8 and 3.9. But for new code using Python 3.10 or later, it is preferable to omit the parentheses to improve readability.
The same syntax for the bindings subform is used by:
let
,letseq
,letrec
(expressions)dlet
,dletseq
,dletrec
,blet
,bletseq
,bletrec
(decorators)- As of v0.15.0, it is possible to use
@dlet(...)
instead of@dlet[...]
in Python 3.8 and earlier.
- As of v0.15.0, it is possible to use
let_syntax
,abbrev
(expression mode)
The following Haskell-inspired, perhaps more pythonic alternative syntaxes are also available:
let[[x := 21,
y := 17,
z := 4] in
x + y + z]
let[x + y + z,
where[x := 21,
y := 17,
z := 4]]
let[[x := 21] in 2 * x]
let[2 * x, where[x := 21]]
These syntaxes take no macro arguments; both the let-body and the bindings are placed inside the ...
in let[...]
.
Note the bindings subform is always enclosed by brackets.
The where
operator, if used, must be macro-imported. It may only appear at the top level of the let-where form, separating the body and the bindings subforms. In any invalid position, where
is considered a syntax error at macro expansion time.
Semantically, these do the exact same thing as the original lispy syntax:
The bindings are evaluated first, and then the body is evaluated with the bindings in place. The purpose of the second variant (the let-where) is just readability; sometimes it looks clearer to place the body expression first, and only then explain what the symbols in it mean.
These syntaxes are valid for all expression forms of
let
, namely:let[]
,letseq[]
,letrec[]
,let_syntax[]
andabbrev[]
. The decorator variants (dlet
et al.,blet
et al.) and the block variants (with let_syntax
,with abbrev
) support only the formats where the bindings subform is given in the macro arguments part, because there the body is in any case placed differently (it's the body of the function being decorated).In the first variant above (the let-in), note that even there, the bindings block needs the brackets. This is due to Python's precedence rules;
in
binds more strongly than the comma (which makes sense almost everywhere else), so to make thein
refer to all of the bindings, the bindings block must be bracketed. If thelet
expander complains your code does not look like alet
form and you have used let-in, check your brackets.In the second variant (the let-where), note the comma between the body and
where
; it is compulsory to make the expression into syntactically valid Python. (It's however semi-easyish to remember, since also English requires the comma for a where-expression. It's not only syntactically valid Python, it is also syntactically valid English, at least for mathematicians.)
Changed in v0.15.0.
Beginning with v0.15.0, the env-assignment syntax presented above is the preferred syntax to establish let-bindings, for consistency with other env-assignments. This reminds that let variables live in an env
, which is created by the let
form.
There is also an alternative, lispy notation for the bindings subform, where each name-value pair is given using brackets:
let[[x, 42], [y, 9001]][...]
let[[[x, 42], [y, 9001]] in ...]
let[..., where[[x, 42], [y, 9001]]]
# one-binding special case: outer brackets not needed
let[x, 42][...]
let[[x, 42] in ...]
let[..., where[x, 42]]
This is similar in spirit to the notation used in v0.14.3 and earlier.
Actually, for backwards compatibility, we still support some use of parentheses instead of brackets in the bindings subform. The following formats, used in versions of unpythonic
up to v0.14.3, are still accepted:
let((x, 42), (y, 9001))[...]
let[((x, 42), (y, 9001)) in ...]
let[..., where((x, 42), (y, 9001))]
# one-binding special case: outer parentheses not needed
let(x, 42)[...]
let[(x, 42) in ...]
let[..., where(x, 42)]
Even though an expr macro invocation itself is always denoted using brackets, as of unpythonic
v0.15.0 parentheses can still be used to pass macro arguments, hence let(...)[...]
is still accepted. The code that interprets the AST for the let-bindings accepts both lists and tuples for each key-value pair, and the top-level container for the bindings subform in a let-in or let-where can be either list or tuple, so whether brackets or parentheses are used does not matter there, either.
Still, brackets are now the preferred delimiter, for consistency between the bindings and body subforms.
We plan to drop support for parentheses to pass macro arguments in the future, when Python 3.9 becomes the minimum Python version supported. The reason we will wait that long is that up to Python 3.8, decorators cannot be subscripted. Up to Python 3.8, @dlet[x, 42]
is rejected by Python's parser, whereas @dlet(x, 42)
is accepted.
The issue has been fixed in Python 3.9. If you already only use 3.9 and later, please prefer brackets to pass macro arguments.
The let
constructs can use a multiple-expression body. The syntax to activate multiple expression mode is an extra set of brackets around the body (like in multilambda
):
let[x := 1,
y := 2][[ # note extra [
y := x + y,
print(y)]]
let[[x := 1,
y := 2] in
[y := x + y, # body starts here
print(y)]]
let[[y := x + y,
print(y)], # body ends here
where[x := 1,
y := 2]]
The let macros implement this by inserting a do[...]
(see below). In a multiple-expression body, a separate internal definition context exists for local variables that are not part of the let
; see the do
macro for details.
Only the outermost set of extra brackets is interpreted as a multiple-expression body. The rest are interpreted as usual, as lists. If you need to return a literal list from a let
form with only one body expression, double the brackets on the body part:
let[x := 1,
y := 2][[
[x, y]]]
let[[x := 1,
y := 2] in
[[x, y]]]
let[[[x, y]],
where[x := 1,
y := 2]]
The outermost brackets delimit the let
form itself, the middle ones activate multiple-expression mode, and the innermost ones denote a list.
Only brackets are affected; parentheses are interpreted as usual, so returning a literal tuple works as expected:
let[x := 1,
y := 2][
(x, y)]
let[[x := 1,
y := 2] in
(x, y)]
let[(x, y),
where[x := 1,
y := 2]]
The main difference of the let
family to Python's own named expressions (a.k.a. the walrus operator, added in Python 3.8) is that x := 42
does not create a scope, but let[x := 42][...]
does. The walrus operator assigns to the name x
in the scope it appears in, whereas in the let
expression, the x
only exists in that expression.
As of v0.15.3, this is somewhat complicated by the fact that now the syntax x := 42
can be used to rebind let variables. See the unit test examples for @dlet
above, at the beginning of the let
section.
let
and letrec
expand into the unpythonic.lispylet
constructs, implicitly inserting the necessary boilerplate: the lambda e: ...
wrappers, quoting variable names in definitions, and transforming x
to e.x
for all x
declared in the bindings. Assignment syntax x := 42
transforms to e.set('x', 42)
. The implicit environment parameter e
is actually named using a gensym, so lexically outer environments automatically show through. letseq
expands into a chain of nested let
expressions.
All the let
macros respect lexical scope, so this works as expected:
letrec[z := 1][[
print(z),
letrec[z := 2][
print(z)]]]
The z
in the inner letrec
expands to the inner environment's z
, and the z
in the outer letrec
to the outer environment's z
.
Similar to let
, letseq
, letrec
, these macros sugar the corresponding unpythonic.lispylet
constructs, with the dletseq
and bletseq
constructs existing only as macros. They expand to nested dlet
or blet
, respectively.
Lexical scoping is respected; each environment is internally named using a gensym. Nesting is allowed.
Examples:
from unpythonic.syntax import macros, dlet, dletseq, dletrec, blet, bletseq, bletrec
@dlet[x := 0] # up to Python 3.8, use `@dlet(x := 0)` instead (decorator subscripting was added in 3.9)
def count():
(x := x + 1) # update `x` in let env
return x
assert count() == 1
assert count() == 2
@dletrec[evenp := (lambda x: (x == 0) or oddp(x - 1)),
oddp := (lambda x: (x != 0) and evenp(x - 1))]
def f(x):
return evenp(x)
assert f(42) is True
assert f(23) is False
@dletseq[x := 1,
x := x + 1,
x := x + 2]
def g(a):
return a + x
assert g(10) == 14
# block versions: the def takes no arguments, runs immediately, and is replaced by the return value.
@blet[x := 21]
def result():
return 2*x
assert result == 42
@bletrec[evenp := (lambda x: (x == 0) or oddp(x - 1)),
oddp := (lambda x: (x != 0) and evenp(x - 1))]
def result():
return evenp(42)
assert result is True
@bletseq[x := 1,
x := x + 1,
x := x + 2]
def result():
return x
assert result == 4
CAUTION: assignment to the let environment uses the assignment expression syntax name := value
. The assignment statement name = value
creates a local variable, as usual - shadowing any variable with the same name from the let
.
The write of a name := value
always occurs to the lexically innermost environment (as seen from the write site) that has that name
. If no lexically surrounding environment has that name
, then the expression remains untransformed, and means binding a new lexical variable in the nearest enclosing scope, as per Python's standard rules.
CAUTION: formal parameters of a function definition, local variables, and any names declared as global
or nonlocal
in a given lexical scope shadow names from an enclosing let
environment. Mostly, this applies to the entirety of that lexical scope. This is modeled after Python's standard scoping rules.
As an exception to the rule, for the purposes of the scope analysis performed by unpythonic.syntax
, creations and deletions of lexical local variables take effect from the next statement, and remain in effect for the lexically remaining part of the current scope. This allows x = ...
to see the old bindings on the RHS, as well as allows the client code to restore access to a surrounding env's x
(by deleting a local x
shadowing it) when desired.
To clarify, here is a sampling from the unit tests:
@dlet[x := "the env x"]
def f():
return x # No lexical variable `x` exists; this refers to the env `x`.
assert f() == "the env x"
@dlet[x := "the env x"]
def f():
x = "the local x" # The lexical variable shadows the env `x`.
return x
assert f() == "the local x"
@dlet[x := "the env x"]
def f():
return x
x = "the unused local x" # This appears *lexically after* the read access on the previous line.
assert f() == "the env x"
@dlet[x := "the env x"]
def test15():
def inner():
(x := "updated env x") # noqa: F841, this writes to the let env since there is no `x` in an intervening scope, according to Python's standard rules.
inner()
return x
assert test15() == "updated env x"
@dlet[x := "the env x"]
def test16():
def inner():
x = "the inner x" # noqa: F841, unused on purpose, for testing. An assignment *statement* does NOT write to the let env.
inner()
return x
assert test16() == "the env x"
@dlet[x := "the env x"]
def test17():
x = "the local x" # This lexical variable shadows the env x.
def inner():
# The env x is shadowed. Since we don't say `nonlocal x`, this creates a new lexical variable scoped to `inner`.
(x := "the inner x") # noqa: F841, unused on purpose, for testing.
inner()
return x
assert test17() == "the local x"
x = "the global x"
@dlet[x := "the env x"]
def f():
global x
return x
assert f() == "the global x"
@dlet[x := "the env x"]
def f():
x = "the local x"
del x # deleting a local, ok!
return x
assert f() == "the env x"
try:
x = "the global x"
@dlet[x := "the env x"]
def f():
global x
del x # ignored by unpythonic's scope analysis, deletion of globals is too dynamic
return x # trying to refer to the deleted global x
f()
except NameError:
pass
else:
assert False, "should have tried to access the deleted global x"
The name resolution behavior described above does not fully make sense, because to define things this way is to conflate static (lexical) and dynamic (run-time) concepts. This feature unfortunately got built before I understood the matter clearly.
Python itself performs name resolution purely lexically, which is arguably the right thing to do. In any given lexical scope, an identifier such as x
always refers to the same variable. Whether that variable has been initialized, or has already been deleted, is another matter, which has to wait until run time - but del x
will not cause the identifier x
to point to a different variable for the remainder of the same scope, like delete[x]
does in the body of an unpythonic
let[]
or do[]
.
To be technically correct, in Python, an identifier x
refers to a name, not to a "variable". Python, like Lisp, has names and values.
Roughly, an identifier is a certain kind of token in the source code text - something that everyday English calls a "name". However, in programming, a name is technically the key component of a key-value pair that is stored in a particular environment.
Very roughly speaking, an environment is just a place to store such pairs, for the purposes of "the variables subsystem" of the language. There are important details, such as that each activation of a function (think: "a particular call of the function") will create a new environment instance, to hold the local variables of that activation; this detail allows lexical closures to work. The piece of bookkeeping for this is termed an activation record. But the important point here is, an environment stores name-value pairs.
An identifier refers to a name. Scoping rules concern themselves with the details of mapping identifiers to names. In lexical scoping (like in Python), the position of the identifier in the source code text determines the search order of environments for the target name, when resolving a particular instance of an identifier in the source code text. Python uses the LEGB ordering (local, enclosing, global, builtin).
Finally, values are the run-time things names point to. They are the value component of the key-value pair.
In this simple example:
def outer():
x = 17
def inner():
x = 23
- The piece of source code text
x
is an identifier. - The outer
x
and the innerx
are names, both of which have the textual representationx
.- Which one of these the identifier
x
refers to depends on where it appears.
- Which one of these the identifier
- The integers
17
and23
are values.
Note that classically, names have no type; values do.
Nowadays, a name may have a type annotation, which reminds the programmer about the type of value that is safe to bind to that particular name. In other words, the code that defines that name (e.g. as a function parameter) promises (in the sense of a contract) that the code knows how to behave if a value of that type is bound to that name (e.g. by passing such a value as a function argument that will be bound to that name).
Here type may be a concrete nominal type such as int
, or for example, it may represent a particular interface (such as the types in collections.abc
), or it may allow multiple mutually exclusive options (a union).
By default, Python treats type annotations as a form of comments; to actually statically type-check Python, Mypy can be used.
Compare the name/value concept to the concept of a variable in the classical sense, such as in C, or cdef
in Cython. In such low-level HLLs, a variable is a named, fixed memory location, with a static data type determining how to interpret the bits at that memory location. The contents of the memory location can be changed, hence "variable" is an apt description.
Note v0.15.0. Now that we use mcpyrate
as the macro expander, let_syntax
and abbrev
are not really needed. We are keeping them mostly for backwards compatibility, and because they exercise a different feature set in the macro expander, making the existence of these constructs particularly useful for system testing.
To define macros in the same module that uses them, see multi-phase compilation in the compiler documentation. Using run-time compiler access, you can even create a macro definition module at run time (e.g. from a quasiquoted block) and inject it to sys.modules
, allowing other code to import and use those macros. See the compiler tests for examples.
To rename existing macros, you can as-import them. As of unpythonic
v0.15.0, doing so for unpythonic.syntax
constructs is not recommended, though, because there is still a lot of old analysis code in the macro implementations that may scan for the original name. This may or may not be fixed in a future release.
These constructs allow to locally splice code at macro expansion time. It is almost like inlining functions.
from unpythonic.syntax import macros, let_syntax, block, expr
def verylongfunctionname(x=1):
return x
# works as an expr macro
y = let_syntax[f := verylongfunctionname][[ # extra brackets: implicit do in body
print(f()),
f(5)]]
assert y == 5
y = let_syntax[f[a] := verylongfunctionname(2*a)][[ # template with formal parameter "a"
print(f[2]),
f[3]]]
assert y == 6
y = let_syntax[[f := verylongfunctionname] in
[print(f()),
f(5)]]
y = let_syntax[[print(f()),
f(5)],
where[f := verylongfunctionname]]
y = let_syntax[[f[a] := verylongfunctionname(2*a)] in
[print(f[2]),
f[3]]]
y = let_syntax[[print(f[2]),
f[3]],
where[f[a] := verylongfunctionname(2*a)]]
# works as a block macro
with let_syntax:
# with block as name:
# with block[a0, ...] as name:
with block[a, b, c] as makeabc: # capture a block of statements
lst = [a, b, c]
makeabc(3 + 4, 2**3, 3 * 3)
assert lst == [7, 8, 9]
# with expr as name:
# with expr[a0, ...] as name:
with expr[n] as nth: # capture a single expression
lst[n]
assert nth(2) == 9
with let_syntax:
with block[a] as twice:
a
a
with block[x, y, z] as appendxyz:
lst += [x, y, z]
lst = []
twice(appendxyz(7, 8, 9))
assert lst == [7, 8, 9]*2
After macro expansion completes, let_syntax
has zero runtime overhead; it completely disappears in macro expansion.
The expr
and block
operators, if used, must be macro-imported. They may only appear in with expr
and with block
subforms at the top level of a with let_syntax
or with abbrev
. In any invalid position, expr
and block
are both considered a syntax error at macro expansion time.
There are two kinds of substitutions:
Bare name and template. A bare name substitution has no parameters. A template substitution has positional parameters. (Named parameters,
*args
,**kwargs
and default values are not supported.)When used as an expr macro, the formal parameter declaration is placed where it belongs; on the name side (LHS) of the binding. In the above example,
f[a]
is a template with a formal parametera
. But when used as a block macro, the formal parameters are declared on theblock
orexpr
"context manager" due to syntactic limitations of Python. To define a bare name substitution, just usewith block as ...:
orwith expr as ...:
with no macro arguments.In the body of
let_syntax
, a bare name substitution is invoked by name (just like a variable). A template substitution is invoked like an expr macro. Any instances of the formal parameters of the template get replaced by the argument values from the use site, at macro expansion time.Note each instance of the same formal parameter (in the definition) gets a fresh copy of the corresponding argument value. In other words, in the example above, each
a
in the body oftwice
separately expands to a copy of whatever code was given as the macro argumenta
.When used as a block macro, there are furthermore two capture modes: block of statements, and single expression. The single expression can be an explicit
do[]
, if multiple expressions are needed. When invoking substitutions, keep in mind Python's usual rules regarding where statements or expressions may appear.(If you know about Python ASTs, do not worry about the
ast.Expr
wrapper needed to place an expression in a statement position; this is handled automatically.)
HINT: If you get a compiler error that some sort of statement was encountered where an expression was expected, check your uses of let_syntax
. The most likely reason is that a substitution is trying to splice a block of statements into an expression position.
Expansion of this macro is a two-step process:
- First, template substitutions.
- Then, bare name substitutions, applied to the result of the first step.
This design is to avoid accidental substitutions into formal parameters of templates (that would usually break the template, resulting at best in a mysterious error, and at worst silently doing something unexpected), if the name of a formal parameter happens to match one of the currently active bare name substitutions.
Within each step, the substitutions are applied in definition order:
- If the bindings are
[x := y, y := z]
, then anx
at the use site transforms toz
. So does ay
at the use site.- But if the bindings are
[y := z, x := y]
, then anx
at the use site transforms toy
, and only an explicity
at the use site transforms toz
.Even in block templates, arguments are always expressions, because invoking a template uses the subscript syntax. But names and calls are expressions, so a previously defined substitution (whether bare name or an invocation of a template) can be passed as an argument just fine. Definition order is then important; consult the rules above.
Nesting let_syntax
is allowed. Lexical scoping is respected. Inner definitions of substitutions shadow outer ones.
When used as an expr macro, all bindings are registered first, and then the body is evaluated. When used as a block macro, a new binding (substitution declaration) takes effect from the next statement onward, and remains active for the lexically remaining part of the with let_syntax
block.
The abbrev
macro is otherwise exactly like let_syntax
, but it expands outside-in. Hence, it has no lexically scoped nesting support, but it has the power to locally rename also macros, because the abbrev
itself expands before any macros invoked in its body. This allows things like:
abbrev[m := macrowithverylongname][
m[tree1] if m[tree2] else m[tree3]]
abbrev[[m := macrowithverylongname] in
m[tree1] if m[tree2] else m[tree3]]
abbrev[m[tree1] if m[tree2] else m[tree3],
where[m := macrowithverylongname]]
which is sometimes useful when writing macros. But using mcpyrate
, note that you can just as-import a macro if you need to rename it.
CAUTION: let_syntax
is essentially a toy macro system within the real macro system. The usual caveats of macro systems apply. Especially, let_syntax
and abbrev
support absolutely no form of hygiene. Be very, very careful to avoid name conflicts.
The let_syntax
macro is meant for simple local substitutions where the elimination of repetition can shorten the code and improve its readability, in cases where the final "unrolled" code should be written out at compile time. If you need to do something complex (or indeed save a definition and reuse it somewhere else, non-locally), write a real macro directly in mcpyrate
.
This was inspired by Racket's let-syntax
and with-syntax
forms.
As a bonus, we provide classical simple let
and letseq
, wholly implemented as AST transformations, providing true lexical variables, but no multi-expression body support. Just like in some Lisps, this version of letseq
(Scheme/Racket let*
) expands into a chain of nested let
expressions, which expand to lambdas.
These are provided in the separate module unpythonic.syntax.simplelet
, and are not part of the unpythonic.syntax
macro API. For simplicity, they support only the lispy list syntax in the bindings subform (using brackets, specifically!), and no haskelly syntax at all:
from unpythonic.syntax.simplelet import macros, let, letseq
let[[x, 42], [y, 23]][...]
let[[x, 42]][...]
letseq[[x, 1], [x, x + 1]][...]
letseq[[x, 1]][...]
Starting with Python 3.8, assignment (rebinding) is possible also in these barebones let
constructs via the walrus operator. For example:
assert let[[x, 42]][x] == 42
assert let[[x, 42]][(x := 5)] == 5
However, this only works for variables created by the innermost let
(viewed from the point where the assignment happens), because nonlocal
is a statement and so cannot be used in expressions.
Macros that run multiple expressions, in sequence, in place of one expression.
Changed in v0.15.3. Env-assignments now use the walrus syntax x := 42
. The old syntax x << 42
is still supported for backward compatibility.
We provide an expr
macro wrapper for unpythonic.do
and unpythonic.do0
, with some extra features.
This essentially allows writing imperative code in any expression position. For an if-elif-else
conditional, see cond
; for loops, see the functions in the module unpythonic.fploop
(looped
and looped_over
).
from unpythonic.syntax import macros, do, local, delete
y = do[local[x := 17],
print(x),
x := 23,
x]
print(y) # --> 23
a = 5
y = do[local[a := 17],
print(a), # --> 17
delete[a],
print(a), # --> 5
True]
Local variables are declared and initialized with local[var := value]
, where var
is a bare name. To explicitly denote "no value", just use None
. The syntax delete[...]
allows deleting a local[...]
binding. This uses env.pop()
internally, so a delete[...]
returns the value the deleted local variable had at the time of deletion. (This also means that if you manually use the do()
function in some code without macros, you can env.pop(...)
in a do-item if needed.)
The local[]
and delete[]
declarations may only appear at the top level of a do[]
, do0[]
, or implicit do
(extra bracket syntax, e.g. for the body of a let
form). In any invalid position, local[]
and delete[]
are considered a syntax error at macro expansion time.
A local
declaration comes into effect in the expression following the one where it appears, capturing the declared name as a local variable for the lexically remaining part of the do
. In a local
, the RHS still sees the previous bindings, so this is valid (although maybe not readable):
result = []
let[lst := []][[result.append(lst), # the let "lst"
local[lst := lst + [1]], # LHS: do "lst", RHS: let "lst"
result.append(lst)]] # the do "lst"
assert result == [[], [1]]
Already declared local variables are updated with var := value
. Updating variables in lexically outer environments (e.g. a let
surrounding a do
) uses the same syntax.
The reason we require local variables to be declared is to allow write access to lexically outer environments.
Assignments are recognized anywhere inside the
do
; but note that anylet
constructs nested inside thedo
, that define variables of the same name, will (inside thelet
) shadow those of thedo
- as expected of lexical scoping.The boilerplate needed by the underlying
unpythonic.do
form (notably thelambda e: ...
wrappers) is inserted automatically. The expressions in ado[]
are only evaluated when the underlyingunpythonic.do
actually runs.When running,
do
behaves likeletseq
; assignments above the current line are in effect (and have been performed in the order presented). Re-assigning to the same name later overwrites.
CAUTION: do[]
supports local variable deletion, but the let[]
constructs do not, by design. When do[]
is used implicitly with the extra bracket syntax, any delete[]
refers to the scope of the implicit do[]
, not any surrounding let[]
scope.
Macros that introduce additional features for Python's lambdas.
Multiple expressions: use [...]
to denote a multiple-expression body. The macro implements this by inserting a do
.
Local variables: available in a multiple-expression body. For details on usage, see do
.
from unpythonic.syntax import macros, multilambda, let
with multilambda:
echo = lambda x: [print(x), x]
assert echo("hi there") == "hi there"
count = let[x := 0][
lambda: [x := x + 1, # x belongs to the surrounding let
x]]
assert count() == 1
assert count() == 2
test = let[x := 0][
lambda: [x := x + 1,
local[y := 42], # y is local to the implicit do
(x, y)]]
assert test() == (1, 42)
assert test() == (2, 42)
myadd = lambda x, y: [print("myadding", x, y),
local[tmp := x + y],
print("result is", tmp),
tmp]
assert myadd(2, 3) == 5
# only the outermost set of brackets denote a multi-expr body:
t = lambda: [[1, 2]]
assert t() == [1, 2]
In the second example, returning x
separately is redundant, because the assignment to the let environment already returns the new value, but it demonstrates the usage of multiple expressions in a lambda.
Changed in v0.15.0. When namedlambda
encounters a lambda definition it cannot infer a name for, it instead injects source location info into the name, provided that the AST node for that particular lambda
has a line number for it. The result looks like <lambda at some/path/mymod.py:201>
.
Who said lambdas have to be anonymous?
from unpythonic.syntax import macros, namedlambda
with namedlambda:
f = lambda x: x**3 # assignment: name as "f"
assert f.__name__ == "f"
gn, hn = let[x := 42, g := None, h := None][[
g := (lambda x: x**2), # env-assignment: name as "g"
h := f, # still "f" (no literal lambda on RHS)
(g.__name__, h.__name__)]]
assert gn == "g"
assert hn == "f"
foo = let[[f7 := (lambda x: x)] in f7] # let-binding: name as "f7"
def foo(func1, func2):
assert func1.__name__ == "func1"
assert func2.__name__ == "func2"
foo(func1=lambda x: x**2, # function call with named arg: name as "func1"
func2=lambda x: x**2) # function call with named arg: name as "func2"
# dictionary literal
d = {"f": lambda x: x**2, # literal string key in a dictionary literal: name as "f"
"g": lambda x: x**2} # literal string key in a dictionary literal: name as "g"
assert d["f"].__name__ == "f"
assert d["g"].__name__ == "g"
Lexically inside a with namedlambda
block, any literal lambda
that is assigned to a name using one of the supported assignment forms is named to have the name of the LHS of the assignment. The name is captured at macro expansion time.
Decorated lambdas are also supported, as is a curry
(manual or auto) where the last argument is a lambda. The latter is a convenience feature, mainly for applying parametric decorators to lambdas. See the unit tests for detailed examples.
The naming is performed using the function unpythonic.namelambda
, which will return a modified copy with its __name__
, __qualname__
and __code__.co_name
changed. The original function object is not mutated.
Supported assignment forms:
-
Single-item assignment to a local name,
f = lambda ...: ...
-
Named expressions (a.k.a. walrus operator, Python 3.8+),
f := lambda ...: ...
. Added in v0.15.0. -
Expression-assignment to an unpythonic environment,
f := (lambda ...: ...)
, and the old syntaxf << (lambda ...: ...)
.- Env-assignments are processed lexically, just like regular assignments. This should not cause problems, because left-shifting by a literal lambda most often makes no sense (whence, that syntax is almost guaranteed to mean an env-assignment).
-
Let-bindings,
let[[f := (lambda ...: ...)] in ...]
, using any let syntax supported by unpythonic (here using the haskelly let-in with env-assign style bindings just as an example). -
Named argument in a function call, as in
foo(f=lambda ...: ...)
. Added in v0.14.2. -
In a dictionary literal
{...}
, an item with a literal string key, as in{"f": lambda ...: ...}
. Added in v0.14.2.
Support for other forms of assignment may or may not be added in a future version. We will maintain a list here; but if you want the gritty details, see the _namedlambda
syntax transformer in unpythonic.syntax.lambdatools
.
Changed in v0.15.0. Up to 0.14.x, the f[]
macro used to be provided by macropy
, but now that we use mcpyrate
, we provide this ourselves. Note that the name of the construct is now fn[]
.
The syntax fn[...]
creates a lambda, where each underscore _
in the ...
part introduces a new parameter:
from unpythonic.syntax import macros, fn
from unpythonic.syntax import _ # optional, makes IDEs happy
double = fn[_ * 2] # --> double = lambda x: x * 2
mul = fn[_ * _] # --> mul = lambda x, y: x * y
The macro does not descend into any nested fn[]
, to allow the macro expander itself to expand those separately.
We have named the construct fn
, because f
is often used as a function name in code examples, local temporaries, and similar. Also, fn[]
is a less ambiguous abbreviation for a syntactic construct that means function, while remaining shorter than the equivalent lambda
.
The underscore _
itself is not a macro. The fn
macro treats the underscore magically, just like MacroPy's f
, but anywhere else the underscore is available to be used as a regular variable.
The underscore does not need to be imported for fn[]
to recognize it, but if you want to make your IDE happy, there is a symbol named _
in unpythonic.syntax
you can import to silence any "undefined name" errors regarding the use of _
. It is a regular run-time object, not a macro. It is available in unpythonic.syntax
(not at the top level of unpythonic
) because it is basically an auxiliary syntactic construct, with no meaningful run-time functionality of its own.
(It could be made into a @namemacro
that triggers a syntax error when it appears in an improper context, like starting with v0.15.0, many auxiliary constructs in similar roles already do. But it was decided that in this particular case, it is more valuable to have the name _
available for other uses in other contexts, because it is a standard dummy name in Python. The lambdas created using fn[]
are likely short enough that not automatically detecting misplaced underscores does not cause problems in practice.)
Because in mcpyrate
, macros can be as-imported, you can rename fn
at import time to have any name you want. The quicklambda
block macro (see below) respects the as-import. You must import also the macro fn
if you use quicklambda
, because quicklambda
internally queries the expander to determine the name(s) the macro fn
is currently bound to. If the fn
macro is not bound to any name, quicklambda
will do nothing.
It is sufficient that fn
has been macro-imported by the time when the with quicklambda
expands. So it is possible, for example, for a dialect template to macro-import just quicklambda
and inject an invocation for it, and leave macro-importing fn
to the user code. The Lispy
variant of the Lispython dialect does exactly this.
To be able to transform correctly, the block macros in unpythonic.syntax
that transform lambdas (e.g. multilambda
, tco
) need to see all lambda
definitions written with Python's standard lambda
.
However, the fn
macro uses the syntax fn[...]
, which (to the analyzer) does not look like a lambda definition. The quicklambda
block macro changes the expansion order, forcing any fn[...]
lexically inside the block to expand before any other macros do.
Any expression of the form fn[...]
, where fn
is any name bound in the current macro expander to the macro unpythonic.syntax.fn
, is understood as a quick lambda. (In plain English, this respects as-imports of the macro fn
.)
Example - a quick multilambda:
from unpythonic.syntax import macros, multilambda, quicklambda, fn, local
from unpythonic.syntax import _ # optional, makes IDEs happy
with quicklambda, multilambda:
func = fn[[local[x := _],
local[y := _],
x + y]]
assert func(1, 2) == 3
This is of course rather silly, as an unnamed formal parameter can only be mentioned once. If we are giving names to them, a regular lambda
is shorter to write. A more realistic combo is:
with quicklambda, tco:
def g(x):
return 2 * x
func1 = fn[g(3 * _)] # tail call
assert func1(10) == 60
func2 = fn[3 * g(_)] # no tail call
assert func2(10) == 60
When a function whose definition (def
or lambda
) is lexically inside a with envify
block is entered, it copies references to its arguments into an unpythonic env
. At macro expansion time, all references to the formal parameters are redirected to that environment. This allows rebinding, from an expression position, names that were originally the formal parameters.
Wherever could that be useful? For an illustrative caricature, consider PG's accumulator puzzle.
The Python 3 solution:
def foo(n):
def accumulate(i):
nonlocal n
n += i
return n
return accumulate
This avoids allocating an extra place to store the accumulator n
. The Python 3.8+ solution, using the new walrus operator, is one line shorter:
def foo(n):
def accumulate(i):
nonlocal n
return (n := n + i)
return accumulate
This is rather clean, but still needs the nonlocal
declaration, which is available as a statement only.
If you want optimal bytecode, these two are the best solutions of the puzzle in Python.
But what if we want to shorten the source code even more, for readability? We could make accumulate
a lambda. But then, to rebind the n
that lives in an enclosing scope - because Python does not support doing that from an expression position - we must make it live in an unpythonic
env
.
Let's use the let-over-lambda idiom:
def foo(n0):
return let[[n := n0] in
(lambda i: (n := n + i))]
This is already shorter, but the let
is used only for (in effect) storing the passed-in value of n0
; we do not place any other variables into the let
environment. Considering the source text already introduces a name n0
which is just used to initialize n
, that's an extra element that could be eliminated.
Enter the envify
macro, which automates this:
with envify:
def foo(n):
return lambda i: (n := n + i)
Note this does not work without envify
, because then the assignment expression will create a local variable (local to the lambda) instead of rebinding the outer existing n
.
Combining with autoreturn
yields the fewest-source-code-elements optimal solution to the accumulator puzzle:
with autoreturn, envify:
def foo(n):
lambda i: (n := n + i)
The with
block adds a few elements, but if desired, it can be refactored into the definition of a custom dialect using mcpyrate
. See dialect examples.
To boldly go where Python without macros just won't. Changing the rules by code-walking and making significant rewrites.
Changed in v0.15.0. The macro is now named autocurry
, to avoid shadowing the curry
function.
from unpythonic.syntax import macros, autocurry
from unpythonic import foldr, composerc as compose, cons, nil
with autocurry:
def add3(a, b, c):
return a + b + c
assert add3(1)(2)(3) == 6
assert add3(1, 2)(3) == 6
assert add3(1)(2, 3) == 6
assert add3(1, 2, 3) == 6
mymap = lambda f: foldr(compose(cons, f), nil)
double = lambda x: 2 * x
print(mymap(double, (1, 2, 3)))
# The definition was auto-curried, so this works here too.
assert add3(1)(2)(3) == 6
Lexically inside a with autocurry
block:
-
All function calls and function definitions (
def
,lambda
) are automatically curried, somewhat like in Haskell, or in#lang
spicy
. -
Function calls are autocurried, and run
unpythonic.curry
in a special mode that no-ops on uninspectable functions (triggering a standard function call with the given args immediately) instead of raisingTypeError
as usual.
CAUTION: Some built-ins are uninspectable or may report their call signature incorrectly; in those cases, curry
may fail, occasionally in mysterious ways. When inspection fails, curry
raises ValueError
, like inspect.signature
does.
Manual uses of the curry
decorator (on both def
and lambda
) are detected, and in such cases the macro skips adding the decorator.
Changed in v0.15.0. The lazy[]
macro, that is used together with with lazify
, used to be provided by macropy
up to unpythonic
v0.14.3. But now that we use mcpyrate
, we provide a lazy[]
macro and an underlying Lazy
class ourselves. For details, see the separate section about lazy[]
and lazyrec[]
below.
Also known as lazy functions. Like lazy/racket, but for Python. Note if you want lazy sequences instead, Python already provides those; just use the generator facility (and decorate your gfunc with unpythonic.gmemoize
if needed).
Lazy function example:
from unpythonic.syntax import macros, lazify
with lazify:
def my_if(p, a, b):
if p:
return a # b never evaluated in this code path
else:
return b # a never evaluated in this code path
assert my_if(True, 23, 1/0) == 23
assert my_if(False, 1/0, 42) == 42
def g(a, b):
return a
def f(a, b):
return g(2*a, 3*b)
assert f(21, 1/0) == 42
In a with lazify
block, function arguments are evaluated only when actually used, at most once each, and in the order in which they are actually used (regardless of the ordering of the formal parameters that receive them). Delayed values (promises) are automatically evaluated (forced) on access. Automatic lazification applies to arguments in function calls and to let-bindings, since they play a similar role. No other binding forms are auto-lazified.
Automatic lazification uses the lazyrec[]
macro (see below), which recurses into certain types of container literals, so that the lazification will not interfere with unpacking.
Note my_if
in the example is a regular function, not a macro. Only the with lazify
is imbued with any magic. Essentially, the above code expands into:
from unpythonic.syntax import macros, lazy
from unpythonic import force
def my_if(p, a, b):
if force(p):
return force(a)
else:
return force(b)
assert my_if(lazy[True], lazy[23], lazy[1/0]) == 23
assert my_if(lazy[False], lazy[1/0], lazy[42]) == 42
def g(a, b):
return force(a)
def f(a, b):
return g(lazy[2*force(a)], lazy[3*force(b)])
assert f(lazy[21], lazy[1/0]) == 42
plus some clerical details to allow mixing lazy and strict code. This second example relies on the magic of closures to capture f's a
and b
into the lazy[]
promises.
Like with continuations
, no state or context is associated with a with lazify
block, so lazy functions defined in one block may call those defined in another.
Lazy code is allowed to call strict functions and vice versa, without requiring any additional effort.
Comboing lazify
with other block macros in unpythonic.syntax
is supported, including autocurry
and continuations
. See the meta section of this README for the correct ordering.
For more details, see the docstring of unpythonic.syntax.lazify
.
Inspired by Haskell, Racket's (delay)
and (force)
, and lazy/racket.
CAUTION: The functions in the module unpythonic.fun
are lazify-aware (so that e.g. curry
and compose
work with lazy functions), as are call
and callwith
in the module unpythonic.funutil
, but a large part of unpythonic
is not. Keep in mind that any call to a strict (regular Python) function will evaluate all of its arguments.
Changed in v0.15.0. Previously, the lazy[]
macro was provided by MacroPy. Now that we use mcpyrate
, which doesn't provide it, we provide it ourselves, in unpythonic.syntax
. We now provide also the underlying Lazy
class ourselves.
Note that a lazy value (an instance of Lazy
) now no longer has a __call__
operator; instead, it has a force()
method. However, the preferred way is to use the top-level function force
, which abstracts away this detail.
The force
function was previously exported in unpythonic.syntax
; now it is available in the top-level namespace of unpythonic
. This follows the general convention that regular functions live in the top-level unpythonic
package, while macros (and in general, syntactic constructs) live in unpythonic.syntax
.
We provide the macros unpythonic.syntax.lazy
, which explicitly lazifies a single expression, and unpythonic.syntax.lazyrec
, which can be used to lazify expressions inside container literals, recursively.
Essentially, lazy[...]
achieves the same result as memoize(lambda: ...)
, with the practical difference that the lazify
subsystem expects the lazy[...]
notation in its analyzer, and will not recognize memoize(lambda: ...)
as a delayed value.
A lazy[]
promise p
is evaluated by calling force(p)
or p.force()
. In unpythonic
, the promise datatype (Lazy
) does not have a __call__
method, because the word force
better conveys the intent.
It is preferable to use the force
top-level function instead of the .force
method, because the function will also pass through any non-promise value, whereas (obviously) a non-promise value will not have a .force
method. Using the function, you can force
a value just to be sure, without caring whether that value was a promise. The force
function is available in the top-level namespace of unpythonic
.
The lazyrec[]
macro allows code like tpl = lazyrec[(1*2*3, 4*5*6)]
. Each item becomes wrapped with lazy[]
, but the container itself is left alone, to avoid interfering with its unpacking. Because lazyrec[]
is a macro and must work by names only, it supports a fixed set of container types: list
, tuple
, set
, dict
, frozenset
, unpythonic.frozendict
, unpythonic.box
, and unpythonic.cons
(specifically, the constructors cons
, ll
and llist
).
The unpythonic
containers must be from-imported for lazyrec[]
to recognize them. Either use from unpythonic import xxx
(recommended), where xxx
is a container type, or import the containers
subpackage by from unpythonic import containers
, and then use containers.xxx
. (The analyzer only looks inside at most one level of attributes. This may change in the future.)
Observe that the analysis in lazyrec[]
must work by names only, because in an eager language any lazification must be performed as a syntax transformation before the code actually runs. Hence, the analysis must be performed statically - and locally, because lazyrec[]
is an expr macro. Fexprs (along with a new calculus to go with them) are the clean, elegant solution, but this requires redesigning the whole language from ground up. Of course, if you are fine with a language not particularly designed for extensibility, and lazy evaluation is your top requirement, you could just use Haskell.
Changed in v0.15.0. The functions force1
and force
now live in the top-level namespace of unpythonic
, no longer in unpythonic.syntax
.
This is mainly useful if you lazy[]
or lazyrec[]
something explicitly, and want to compute its value outside a with lazify
block.
We provide the functions force1
and force
. Using force1
, if x
is a lazy[]
promise, it will be forced, and the resulting value is returned. If x
is not a promise, x
itself is returned, à la Racket. The function force
, in addition, descends into containers (recursively). When an atom x
(i.e. anything that is not a container) is encountered, it is processed using force1
.
Mutable containers are updated in-place; for immutables, a new instance is created, but as a side effect the promise objects in the input container will be forced. Any container with a compatible collections.abc
is supported. (See unpythonic.mogrify
for details.) In addition, as special cases unpythonic.box
and unpythonic.cons
are supported.
Why do we auto-lazify in certain kinds of binding constructs, but not in others? Function calls and let-bindings have one feature in common: both are guaranteed to bind only new names. Even if a name that uses the same identifier is already in scope, they are distinct; the new binding will shadow the old one. Auto-lazification of all assignments, on the other hand, in a language that allows mutation is dangerous, because then this superficially innocuous code will fail:
from unpythonic.syntax import macros, lazify
with lazify:
a = 10
a = 2 * a
print(a) # 20, right?
If we chose to auto-lazify assignments, then the example would expand to:
from unpythonic.syntax import macros, lazy
from unpythonic import force
a = lazy[10]
a = lazy[2 * force(a)]
print(force(a))
Scan that again: in the second assignment, the lazy[]
sets up a promise, which will force a
at the time when the containing promise is forced, but at that time the name a
points to a promise, which will force...
The fundamental issue is that a = 2 * a
is an imperative update. Therefore, to avoid this infinite loop trap for the unwary, assignments are not auto-lazified. Note that if we use two different names, this works just fine:
from unpythonic.syntax import macros, lazy
from unpythonic import force
a = lazy[10]
b = lazy[2 * force(a)]
print(force(b))
because now at the time when b
is forced, the name a
still points to the value we intended it to. That is, code that is normalized to static single assignment (SSA) form could be auto-lazified.
If you are sure you have new definitions and not imperative updates, you can just manually use lazy[]
(or lazyrec[]
, as appropriate) on the RHS. Or if it is fine to use eager evaluation, just omit the lazy[]
, thus allowing Python to evaluate the RHS immediately.
Beside function calls (which bind the parameters of the callee to the argument values of the call) and assignments, there are many other binding constructs in Python. For a full list, see here, or locally here, in function get_names_in_store_context
. Particularly noteworthy in the context of lazification are the for
loop and the with
context manager.
In Python's for
, the loop counter is an imperatively updated single name. In many use cases a rapid update is desirable for performance reasons, and in any case, the whole point of the loop is (almost always) to read the counter (and do something with the value) at least once per iteration. So it is much simpler, faster, and equally correct not to lazify there.
In with
, the whole point of a context manager is that it is eagerly initialized when the with
block is entered, and finalized when the block exits. Since our lazy code can transparently use both bare values and promises (due to the semantics of our force1
), and the context manager would have to be eagerly initialized anyway, we have chosen not to lazify there.
To borrow a term from PG's On Lisp, to make lazify
pay-as-you-go, a special mode in unpythonic.trampolined
is automatically enabled by with lazify
to build lazify-aware trampolines in order to avoid a drastic performance hit (~10x) in trampolines built for regular strict code.
The idea is that the mode is enabled while any function definitions in the with lazify
block run, so they get a lazify-aware trampoline when the trampolined
decorator is applied. This should be determined lexically, but that is complicated to do, because the decorator is applied at run time; so we currently enable the mode for the dynamic extent of the with lazify
. Usually this is close enough. The main case where this can behave unexpectedly is:
from unpythonic.syntax import macros, lazify
from unpythonic import trampolined
@trampolined # strict trampoline
def g():
...
def make_f():
@trampolined # which kind of trampoline is this?
def f():
...
return f
f1 = make_f() # f1 gets the strict trampoline
with lazify:
@trampolined # lazify-aware trampoline
def h():
...
f2 = make_f() # f2 gets the lazify-aware trampoline
TCO chains with an arbitrary mix of lazy and strict functions should work as long as the first function in the chain has a lazify-aware trampoline, because the chain runs under the trampoline of the first function. The trampolines of any tail-called functions are skipped by the TCO machinery.
Tail-calling from a strict function into a lazy function should work, because all arguments are evaluated at the strict side before the call is made.
But tail-calling strict -> lazy -> strict
will fail in some cases. The second strict callee may get promises instead of values, because the strict trampoline does not have the maybe_force_args
(the mechanism with lazify
uses to force the args when lazy code calls into strict code).
The reason we have this hack is that it allows the performance of strict code using unpythonic
's TCO machinery, not even caring that a lazify
exists, to be unaffected by the additional machinery used to support automatic lazy-strict interaction.
This is the macro that applies tail call optimization (TCO) automatically. See the manual section on trampolined
and jump
on what TCO is and where it is useful.
Using with tco
, there is no need to manually use trampolined
or jump
:
from unpythonic.syntax import macros, tco
with tco:
def fact(n, acc=1):
if n == 0:
return acc
return fact(n - 1, n * acc)
print(fact(4)) # 24
fact(5000) # no crash
with tco:
evenp = lambda x: (x == 0) or oddp(x - 1)
oddp = lambda x: (x != 0) and evenp(x - 1)
assert evenp(10000) is True
with tco:
def evenp(x):
if x == 0:
return True
return oddp(x - 1)
def oddp(x):
if x != 0:
return evenp(x - 1)
return False
assert evenp(10000) is True
All function definitions (def
and lambda
) lexically inside the with tco
block undergo TCO transformation. The functions are automatically @trampolined
, and any tail calls in their return values are converted to jump(...)
for the TCO machinery. Here return value is defined as:
-
In a
def
, the argument expression ofreturn
, or of a call to a known escape continuation. -
In a
lambda
, the whole body, as well as the argument expression of a call to a known escape continuation.
What is considered a known escape continuation is explained below, in the section TCO and call_ec
.
To find the tail position inside a compound return value, we recursively handle any combination of a if p else b
, and
, or
; and from unpythonic.syntax
, do[]
, let[]
, letseq[]
, letrec[]
. Support for do[]
includes also any multilambda
blocks that have already expanded when tco
is processed. The macros aif[]
and cond[]
are also supported, because they expand into a combination of let[]
, do[]
, and a if p else b
.
CAUTION: In an and
/or
expression, only the last item of the whole expression is in tail position. This is because in general, it is impossible to know beforehand how many of the items will be evaluated.
CAUTION: In a def
you still need the return
; it marks a return value. If you want tail position to imply a return
, use the combo with autoreturn, tco
(on autoreturn
, see below).
TCO is based on a strategy similar to MacroPy's tco
macro, but using unpythonic's TCO machinery, and working together with the macros introduced by unpythonic.syntax
. The semantics are slightly different; by design, unpythonic
requires an explicit return
to mark tail calls in a def
. A call that is strictly speaking in tail position, but lacks the return
, is not TCO'd, and Python's implicit return None
then shuts down the trampoline, returning None
as the result of the TCO chain.
The tco
macro detects and skips any with continuations
blocks inside the with tco
block, because continuations
already implies TCO. This is done for the specific reason of allowing the Lispython dialect to use with continuations
, because the dialect itself implies a with tco
for the whole module. Hence, in that dialect, the user code has no way to exit the TCO context.
The tco
and continuations
macros actually share a lot of the code that implements TCO; continuations
, for its TCO processing, just hooks into some callbacks to make additional AST edits.
This is mainly of interest for lambdas, which have no return
, and for "multi-return" from a nested function.
It is important to recognize a call to an escape continuation as such, because the argument given to an escape continuation is essentially a return value. If this argument is itself a call, it needs the TCO transformation to be applied to it.
For escape continuations in tco
and continuations
blocks, only basic uses of call_ec
are supported, for automatically extracting names referring to an escape continuation. Basic use is defined as either of these two cases:
from unpythonic import call_ec
# use as decorator
@call_ec
def result(ec):
...
# use directly on a literal lambda (effectively, as a decorator)
result = call_ec(lambda ec: ...)
When macro expansion of the with tco
block starts, names of escape continuations created anywhere lexically within the with tco
block are captured, provided that the creation takes place using one of the above basic use patterns.
In addition, the literal function names ec
, brk
and throw
are always understood as referring to an escape continuation. The name ec
is the customary name for the parameter of a function passed to call_ec
. The name brk
is the customary name for the break continuation created by @breakably_looped
and @breakably_looped_over
. The name throw
is understood as referring to the function unpythonic.throw
.
Obviously, having a name of ec
, brk
or throw
is not by itself sufficient to make a function into an escape continuation, even though tco
(and continuations
) will think of it as such. The function also needs to actually implement some kind of an escape mechanism. An easy way to get an escape continuation, where this has already been done for you, is to use call_ec
. Another such mechanism is the catch
/throw
pair.
See the docstring of unpythonic.syntax.tco
for details.
Where control flow is your playground.
We provide genuine multi-shot continuations for Python. Compare generators and coroutines, which are resumable functions, or in other words, single-shot continuations. In single-shot continuations, once execution passes a certain point, it cannot be rewound. Multi-shot continuations can be emulated using single-shot continuations, but this makes the execution time O(n**2)
, because when we want to restart again at an already passed point, the execution must start from the beginning, replaying the whole history. In contrast, we implement continuations that can natively resume execution multiple times from the same point.
CAUTION: This feature has some limitations, and is mainly intended for experimenting with, and teaching, multi-shot continuations in a Python setting. Particularly:
-
There are seams between continuation-enabled code and regular Python code. (This happens with any feature that changes the semantics of only a part of a program.)
-
There is no
dynamic-wind
: Scheme's generalization oftry/finally
, which beside thefinally
exit hook, has an entry hook for when control jumps back into the block from outside it. -
Interaction of continuations with exceptions is not fully thought out.
-
Interaction with async functions is not even implemented. For this reason, an
async def
orawait
appearing inside awith continuations
block is considered a syntax error. -
The implicit
cc
parameter might not be a good idea in the long run.- This design suffers from the same lack of transparency, whence the same potential for bugs, as the implicit
this
in many languages (e.g. C++ and JavaScript). - Because
cc
is declared implicitly, it is easy to forget that every function definition anywhere inside thewith continuations
block introduces its owncc
parameter.- Particularly, also a
lambda
is a function definition. - This introduces a bug when one introduces an inner function, and attempts to use the outer
cc
inside the inner function body, forgetting that inside the inner function, the namecc
points to the inner function's owncc
.- The correct pattern is to
outercc = cc
in the outer function, and then useoutercc
inside the inner function body.
- The correct pattern is to
- Not introducing its own
this
was precisely why the arrow function syntax was introduced to JavaScript in ES6.
- Particularly, also a
- Python gets
self
right in that while it is conveniently passed implicitly, it must be declared explicitly, eliminating the transparency issue. - On the other hand, a semi-explicit
cc
, like Python'sself
, was tried in an early version of this continuations subsystem, and it led to a lot of boilerplate.- It is especially bad that to avoid easily avoidable bugs regarding passing in the wrong arguments,
cc
effectively must be a keyword parameter, necessitating the user to writedef f(x, *, cc)
. Not having to type out the, *, cc
is much nicer, albeit not as pythonic.
- It is especially bad that to avoid easily avoidable bugs regarding passing in the wrong arguments,
- This design suffers from the same lack of transparency, whence the same potential for bugs, as the implicit
If you are new to continuations, see the short and easy Python-based explanation of the basic idea.
This continuations system in unpythonic
began as a very loose pythonification of Paul Graham's continuation-passing macros, chapter 20 in On Lisp.
The approach differs from native continuation support (such as in Scheme or Racket) in that the continuation is captured only where explicitly requested with call_cc[]
. This lets most of the code work as usual, while performing the continuation magic where explicitly desired.
As a consequence of the approach, our continuations are delimited in the very crude sense that the captured continuation ends at the end of the body where the currently dynamically outermost call_cc[]
was invoked. Notably, in unpythonic
, a continuation eventually terminates and returns a value (provided that the code contained in the continuation itself terminates), without hijacking the rest of the whole-program execution.
Hence, if porting some code that uses call/cc
from Racket to Python, in the Python version the call_cc[]
may be need to be placed further out to capture the relevant part of the computation. For example, see amb
in the demonstration below; a Scheme or Racket equivalent usually has the call/cc
placed inside the amb
operator itself, whereas in Python we must place the call_cc[]
at the call site of amb
, so that the continuation captures the remainder of the call site.
Observe that while our outermost call_cc
already somewhat acts like a prompt (in the sense of delimited continuations), we are currently missing the ability to set a prompt wherever (inside code that already uses call_cc
somewhere) and make the continuation terminate there. So what we have right now is something between proper delimited continuations and classic whole-computation continuations - not really co-values, but not really delimited continuations, either.
(TODO: If I interpret the wiki page right, our call_cc
performs the job of reset
; the called function forms the body of the reset
. The cc
argument passed into the called function performs the job of shift
.)
For various possible program topologies that continuations may introduce, see these clarifying pictures.
For full documentation, see the docstring of unpythonic.syntax.continuations
. The unit tests [1] [2] [3] [4] may also be useful as usage examples.
Note on debugging: If a function containing a call_cc[]
crashes below a line that has a call_cc[]
invocation, the stack trace will usually have the continuation function somewhere in it, containing the line number information, so as usual, you can pinpoint the source code line where the error occurred. For a function f
, continuation definitions created by call_cc[]
invocations within its body are named f_cont_<some_uuid>
.
Be aware that especially in complex block macro combos (e.g. with lazify, autocurry, continuations
), the other block macros may have spit out many internal function calls that, at run time, get called after the relevant stack frame that points to the actual user program. So check the stack trace as usual, but check further up than usual.
Using the with step_expansion
macro from mcpyrate.debug
may help in understanding how the macro-expanded code actually looks like.
Note on exceptions: Raising an exception, or signaling and restarting, will partly unwind the call stack, so the continuation from the level that raised the exception will be cancelled. This is arguably exactly the expected behavior.
Demonstration of continuations:
from unpythonic.syntax import macros, continuations, call_cc
with continuations:
# basic example - how to call a continuation manually:
k = None # a kontinuation is konventionally kalled k
def setk(*args, cc):
global k
k = cc
return args
def doit():
lst = ['the call returned']
*more, = call_cc[setk('A')]
return lst + list(more)
print(doit())
print(k('again'))
print(k('thrice', '!'))
# McCarthy's amb operator - yes, the real thing - in Python:
stack = []
def amb(lst, cc):
if not lst:
return fail()
first, *rest = tuple(lst)
if rest:
ourcc = cc # important: use any name other than "cc"
stack.append(lambda: amb(rest, cc=ourcc))
return first
def fail():
if stack:
f = stack.pop()
return f()
# Pythagorean triples
def pt():
z = call_cc[amb(range(1, 21))]
y = call_cc[amb(range(1, z+1))]
x = call_cc[amb(range(1, y+1))]
if x*x + y*y != z*z:
return fail()
return x, y, z
print(pt())
print(fail()) # ...outside the dynamic extent of pt()!
print(fail())
print(fail())
print(fail())
print(fail())
print(fail())
Code within a with continuations
block is treated specially.
Roughly:
Each function definition (
def
orlambda
) in awith continuations
block has an implicit formal parametercc
, even if not explicitly declared in the formal parameter list.
- The continuation machinery will set the default value of
cc
to the default continuation (identity
), which just returns its argument(s).
- The default value allows these functions to be called also normally without passing a
cc
. In effect, the function will then return normally.- If
cc
is not declared explicitly, it is implicitly declared as a by-name-only parameter namedcc
, and the default value is set automatically.- If
cc
is declared explicitly, the default value is set automatically ifcc
is in a position that can accept a default value, and no default has been set by the user.
- Positions that can accept a default value are the last positional parameter that has no default, and a by-name-only parameter in any syntactically allowed position.
- Having a hidden parameter is somewhat magic, but overall improves readability, as this allows declaring
cc
only where actually explicitly needed.
- CAUTION: Usability trap: in nested function definitions, each
def
andlambda
comes with its own implicitcc
.
- In the above
amb
example, the local variable is namedourcc
, so that the continuation passed in from outside (into thelambda
, by closure) will have a name different from thecc
implicitly introduced by thelambda
itself.- This is possibly subject to change in a future version (pending the invention of a better API), but for now just be aware of this gotcha.
- Beside
cc
, there's also a mechanism to keep track of the captured tail of a computation, which is important to have edge cases work correctly. See the note on pcc (parent continuation) in the docstring ofunpythonic.syntax.continuations
, and the pictures.In a function definition inside the
with continuations
block:
- Most of the language works as usual; especially, any non-tail function calls can be made as usual.
return value
orreturn Values(...)
is actually a tail-call intocc
, passing the given value(s) as arguments.
- As in other parts of
unpythonic
, returning aValues
means returning multiple-return-values and/or named-return-values.
- This is important if the return value is received by the assignment targets of a
call_cc[]
. If you get aTypeError
concerning the arguments of a function with a name ending in_cont_<some_uuid>
, check yourcall_cc[]
invocations and thereturn
in the call_cc'd function.- Changed in v0.15.0. Up to v0.14.3, multiple return values used to be represented as a
tuple
. Now returning atuple
means returning one value that is a tuple.return func(...)
is actually a tail-call intofunc
, passing along (by default) the current value ofcc
to become itscc
.
- Hence, the tail call is inserted between the end of the current function body and the start of the continuation
cc
.- To override which continuation to use, you can specify the
cc=...
kwarg, as inreturn func(..., cc=mycc)
, as was done in theamb
example above.
- The
cc
argument, if passed explicitly, must be passed by name.
- CAUTION: This is not enforced, as the machinery does not analyze positional arguments in any great detail. The machinery will most likely break in unintuitive ways (or at best, raise a mysterious
TypeError
) if this rule is violated.- The function
func
must be a defined in awith continuations
block, so that it knows what to do with the named argumentcc
.
- Attempting to tail-call a regular function breaks the TCO chain and immediately returns to the original caller (provided the function even accepts a
cc
named argument; if not, you will get aTypeError
).- Be careful:
xs = list(args); return xs
andreturn list(args)
mean different things.
- Because
list(args)
is a function call,return list(args)
will attempt to tail-calllist
as a continuation-enabled function (which it is not, you will get aTypeError
), before passing its result into the current continuation.- Using
return xs
instead will pass an inert data value into the current continuation.- TCO is automatically applied to these tail calls. The TCO processing of
continuations
uses the exact same machinery as thetco
macro, performing some additional AST edits via hooks.The
call_cc[]
statement essentially splits its use site into before and after parts, where the after part (the continuation) can be run a second and further times, by later calling the callable that represents the continuation. This makes a computation resumable from a desired point.
- The continuation is essentially a closure.
- Just like in Scheme/Racket, only the control state is checkpointed by
call_cc[]
; any modifications to mutable data remain.- Assignment targets can be used to get the return value of the function called by
call_cc[]
.- Just like in Scheme/Racket's
call/cc
, the values that get bound to thecall_cc[]
assignment targets on second and further calls (when the continuation runs) are the arguments given to the continuation when it is called (whether implicitly or manually).- A first-class reference to the captured continuation is available in the function called by
call_cc[]
, as itscc
argument.
- The continuation itself is a function that takes positional arguments, plus a named argument
cc
.
- The call signature for the positional arguments is determined by the assignment targets of the
call_cc[]
.- The
cc
parameter is there only so that a continuation behaves just like any continuation-enabled function when tail-called, or when later used as the target of anothercall_cc[]
.- Basically everywhere else,
cc
points to the identity function - the default continuation just returns its argument(s).
- This is unlike in Scheme or Racket, which implicitly capture the continuation at every expression.
- Inside a
def
,call_cc[]
generates a tail call, thus terminating the original (parent) function. Hencecall_ec
does not combo withwith continuations
.- At the top level of the
with continuations
block,call_cc[]
generates a normal call. In this case there is no return value for the block (for the continuation, either), because the use site of thecall_cc[]
is not inside a function.
-
Unlike generators,
call_cc[]
allows resuming also multiple times from an earlier checkpoint, even after execution has already proceeded further. Generators can be easily built on top ofcall/cc
. Python version, Racket version.- The Python version is a pattern that could be packaged into a macro with
mcpyrate
; the Racket version has been packaged as a macro. - Both versions are just demonstrations for teaching purposes. In production code, use the language's native functionality.
- Python's built-in generators have no restriction on where
yield
can be placed, and provide better performance. - Racket's standard library provides generators.
- Python's built-in generators have no restriction on where
- The Python version is a pattern that could be packaged into a macro with
-
Unlike exceptions, which only perform escapes,
call_cc[]
allows to jump back at an arbitrary time later, also after the dynamic extent of the original function where thecall_cc[]
appears. Escape continuations are a special case of continuations, so exceptions can be built on top ofcall/cc
.- As explained in detail by Matthew Might, exceptions are fundamentally based on (escape) continuations; the "unwinding the call stack" mental image is "not even wrong".
So if all you want is generators or exceptions (or even resumable exceptions a.k.a. conditions), then a general call/cc
mechanism is not needed. The point of call/cc
is to provide the ability to resume more than once from the same, already executed point in the program. In other words, call/cc
is a general mechanism for bookmarking the control state.
However, its usability leaves much to be desired. This has been noted e.g. in Oleg Kiselyov: An argument against call/cc and John Shutt: Guarded continuations. For example, Shutt writes:
The traditional Scheme device for acquiring a first-class continuation object is call/cc, which calls a procedure and passes to that procedure the continuation to which that call would normally return. Frankly, this was always a very clumsy way to work with continuations; one might almost suspect it was devised as an "esoteric programming language" feature, akin to INTERCAL's COME FROM statement.
To keep things relatively straightforward, our call_cc[]
is only allowed to appear at the top level of:
- the
with continuations
block itself - a
def
inside that block
Nested defs are ok; here top level only means the top level of the currently innermost def
.
If you need to place call_cc[]
inside a loop, use @looped
et al. from the module unpythonic.fploop
; this has the loop body represented as the top level of a def
. Keep in mind that only the control state is bookmarked.
Multiple call_cc[]
statements in the same function body are allowed. These essentially create nested closures.
In any invalid position, call_cc[]
is considered a syntax error at macro expansion time.
Syntax:
In unpythonic
, call_cc
is a statement, with the following syntaxes:
x = call_cc[f(...)]
*xs = call_cc[f(...)]
x0, ... = call_cc[f(...)]
x0, ..., *xs = call_cc[f(...)]
call_cc[f(...)]
x = call_cc[f(...) if p else g(...)]
*xs = call_cc[f(...) if p else g(...)]
x0, ... = call_cc[f(...) if p else g(...)]
x0, ..., *xs = call_cc[f(...) if p else g(...)]
call_cc[f(...) if p else g(...)]
NOTE: *xs
may need to be written as *xs,
in order to explicitly make the LHS into a tuple. The variant without the comma seems to work when run from a .py
file with the macropython
bootstrapper from mcpyrate
, but fails in code run interactively in the mcpyrate
REPL.
NOTE: f()
and g()
must be literal function calls. Sneaky trickery (such as calling indirectly via unpythonic.call
or unpythonic.curry
) is not supported. This limitation is for simplicity; the call_cc[]
invocation needs to patch the cc=...
kwarg of the call being made.
The prefix
and curry
macros, however, are supported; just order the block macros as in The xmas tree combo.
Assignment targets:
-
To destructure positional multiple-values (from a
Values
return value of the function called by thecall_cc
), use a tuple assignment target (comma-separated names, as usual). Destructuring named return values from acall_cc
is currently not supported due to syntactic limitations. -
The last assignment target may be starred. It is transformed into the vararg (a.k.a.
*args
, star-args) of the continuation function created by thecall_cc
. It will capture a whole tuple, or any excess items, as usual. -
To ignore the return value of the
call_cc
'd function, just omit the assignment part. This is useful iff
was called only to perform its side-effects. The classic side effect is to stashcc
somewhere for later use.
Conditional variant:
-
p
is any expression. It is evaluated at run time, as usual. When the result is truthy,f(...)
is called, and when falsey,g(...)
is called. -
Each of
f(...)
,g(...)
may beNone
. ANone
skips the function call, proceeding directly to the continuation. Upon skipping, all assignment targets (if any are present) are set toNone
. The starred assignment target (if present) gets the empty tuple.
The main use case of the conditional variant is for things like:
with continuations:
k = None
def setk(cc):
global k
k = cc
def dostuff(x):
call_cc[setk() if x > 10 else None] # update stashed continuation only if x > 10
...
Main differences to call/cc
in Scheme and Racket:
Compared to Scheme/Racket, where call/cc
will capture also expressions occurring further up in the call stack, our call_cc
may be need to be placed differently (further out, depending on what needs to be captured) due to the delimited nature of the continuations implemented here.
Scheme and Racket implicitly capture the continuation at every position, whereas we do it explicitly, only at the use sites of the call_cc[]
macro.
Also, since there are limitations to where a call_cc[]
may appear, some code may need to be structured differently to do some particular thing, if porting code examples originally written in Scheme or Racket.
Unlike call/cc
in Scheme/Racket, our call_cc
takes a function call as its argument, not just a function reference. Also, there is no need for it to be a one-argument function; any other args can be passed in the call. The cc
argument is filled implicitly and passed by name; any others are passed exactly as you write in the invocation.
CAUTION: Do not use with tco
inside a with continuations
block; continuations
already implies TCO. The continuations
macro makes no attempt to skip with tco
blocks inside it.
If you want to use multilambda
inside a with continuations
block, it needs to go on the outside:
from unpythonic.syntax import macros, continuations, multilambda
with multilambda, continuations:
f = lambda x: [print(x), x**2]
assert f(42) == 1764
This works, because the continuations
macro understands already expanded let[]
and do[]
, and multilambda
generates and expands a do[]
. (Any explicit use of do[]
in a lambda body or in a return
is also ok; recall that macros expand from inside out.)
Similarly, if you want to use quicklambda
inside a with continuations
block, place it on the outside:
from unpythonic.syntax import macros, continuations, quicklambda, fn
with quicklambda, continuations:
g = fn[_**2]
assert g(42) == 1764
This ordering makes the f[...]
notation expand into standard lambda
notation before continuations
is expanded.
To enable both of these, use with quicklambda, multilambda, continuations
(although the usefulness of this combo may be questionable).
An escape continuation ec
is a continuation, too. How can we use cc
to escape?
Pretty much by the definition of a continuation, in a with continuations
block, a trick that should at first glance produce an escape is to set cc
to the cc
of the caller, and then return the desired value. There is however a subtle catch, due to the way we implement continuations.
First, consider this basic strategy, without any macros:
from unpythonic import call_ec
def double_odd(x, ec):
if x % 2 == 0: # reject even "x"
ec("not odd")
return 2 * x
@call_ec
def result1(ec):
y = double_odd(42, ec)
z = double_odd(21, ec)
return z
@call_ec
def result2(ec):
y = double_odd(21, ec)
z = double_odd(42, ec)
return z
assert result1 == "not odd"
assert result2 == "not odd"
Here ec
is the escape continuation of the result1
/result2
block, due to the placement of the call_ec
.
Now, can we use the same strategy with the general continuation machinery?
from unpythonic.syntax import macros, continuations, call_cc
with continuations:
def double_odd(x, ec, cc):
if x % 2 == 0:
cc = ec
return "not odd"
return 2 * x
def main1(cc):
# cc actually has a default (`identity`), so it's ok to not pass anything as cc here.
y = double_odd(42, ec=cc) # y = "not odd"
z = double_odd(21, ec=cc) # we could tail-call, but let's keep this similar to the first example.
return z
def main2(cc):
y = double_odd(21, ec=cc)
z = double_odd(42, ec=cc)
return z
assert main1() == 42
assert main2() == "not odd"
The cc
inside double_odd
is the implicitly passed cc
... which, naively, should represent the continuation of the current call into double_odd
. So far, so good.
However, because the example contains no call_cc[]
statements, the actual value of cc
, anywhere in this example, is always just identity
. Scan that again: in this example, cc
is not the actual continuation, because no continuation captures were requested.
Even though we pass the cc
of main1
/main2
as an explicit argument "ec
" to use as an escape continuation (like the first example does with ec
), it is still identity
- and hence cannot perform an escape.
We must call_cc[]
to request a capture of the continuation, hence populating cc
with something useful:
from unpythonic.syntax import macros, continuations, call_cc
with continuations:
def double_odd(x, ec, cc):
if x % 2 == 0:
cc = ec
return "not odd"
return 2 * x
def main1(cc):
y = call_cc[double_odd(42, ec=cc)] # <-- the only change is adding the call_cc[]
z = call_cc[double_odd(21, ec=cc)] # <--
return z
def main2(cc):
y = call_cc[double_odd(21, ec=cc)] # <--
z = call_cc[double_odd(42, ec=cc)] # <--
return z
assert main1() == "not odd"
assert main2() == "not odd"
This variant performs as expected.
There is also a second, even subtler catch; instead of setting cc = ec
and returning a value, as we did, just tail-calling ec
with that same value does not do what we want. Why? Because - as explained in the rules of the continuations
macro, above - a tail-call is inserted between the end of the function, and whatever continuation cc
currently points to.
Most often that is exactly what we want, but in this particular case, it causes both continuations to run, in sequence. But if, instead of performing a tail call to the ec
, we set cc = ec
, then the function's original cc
argument (the one supplied by call_cc[]
) is discarded, hence that continuation never runs - and we get the effect we want, replacing the cc
by the ec
.
Such subtleties arise essentially from the difference between a language that natively supports continuations (Scheme, Racket) and one that has continuations hacked on top of it as macros performing a CPS conversion only partially (like Python with unpythonic.syntax
, or Common Lisp with PG's continuation-passing macros). The macro approach works, but the programmer needs to be careful.
In unpythonic
specifically, a continuation is just a function. (As John Shutt has pointed out, in general this is not true. The calculus underlying the language becomes much cleaner if continuations are defined as a separate control flow mechanism orthogonal to function application. Continuations are not intrinsically a whole-computation device, either.)
The continuation function must be able to take as many positional arguments as the previous function in the TCO chain is trying to pass into it. Keep in mind that:
-
In
unpythonic
, multiple return values (and named return values) are represented as aValues
object. So if your function doesreturn Values(a, b)
, and that is being fed into the continuation, this implies that the continuation must be able to take two positional arguments.Changed in v0.15.0. Up to v0.14.3, a
tuple
used to represent multiple-return-values; now it denotes a single return value that is a tuple. TheValues
type allows not only multiple return values, but also named return values. Named return values are fed as kwargs. -
At the end of any function in Python, at least an implicit bare
return
always exists. It will try to pass in the valueNone
to the continuation, so a continuation must be able to accept one positional argument.- This is handled automatically for continuations created by
call_cc[]
. If no assignment targets are given,call_cc[]
automatically creates one ignored positional argument that defaults toNone
.
- This is handled automatically for continuations created by
If there is an arity mismatch, Python will raise TypeError
as usual. The actual error message may be unhelpful due to macro transformations. Look for a mismatch between a return
and the call signature of a function used as a continuation. Most often, this is the f
in a cc=f
.
Usually, a function to be used as a continuation is defined inside the with continuations
block. This automatically introduces the implicit cc
parameter, and in general makes the source code undergo the transformations needed by the continuation machinery.
However, as the only exception to this rule, if the continuation is meant to act as the endpoint of the TCO chain - i.e. terminating the chain and returning to the original top-level caller - then it may be defined outside the with continuations
block. Recall that in a with continuations
block, returning an inert data value (i.e. not making a tail call) transforms into a tail-call into the cc
(with the given data becoming its argument(s)); it does not set the cc
argument of the continuation being called, or even require that it has a cc
parameter that could accept one.
These observations make unpythonic.identity
eligible as a continuation, even though it is defined elsewhere in the library and it has no cc
parameter.
Finally, note that a function that has no cc
parameter cannot be used as the target of an explicit tail-call inside a with continuations
block, since a tail-call there will attempt to supply a cc
argument to the function being tail-called. Likewise, it cannot be used as the function called by a call_cc[]
, since this will also attempt to supply a cc
argument.
Strictly speaking, True
. The implementation is very different (much more than just exposing a hidden parameter), not to mention it has to be a macro, because it triggers capture - something that would not need to be requested for separately, had we converted the whole program into CPS.
The selective capture approach is however more efficient when we implement the continuation system in Python, indeed on Python (in the sense of On Lisp), since we want to run most of the program the usual way with no magic attached. This way there is no need to sprinkle absolutely every statement and expression with a def
or a lambda
. (Not to mention Python's lambda
is underpowered due to the existence of some language features only as statements, so we would need to use a mixture of both, which is already unnecessarily complicated.) Function definitions are not intended as the only control flow construct in Python, so the compiler likely would not optimize heavily enough (i.e. eliminate almost all of the implicitly introduced function definitions), if we attempted to use them as such.
Continuations only need to come into play when we explicitly request for one (ZoP §2); this avoids introducing any more extra function definitions than needed.
The name is nevertheless call_cc
, because the resulting behavior is close enough to call/cc
. Instead of call with current continuation, we could retcon the name to mean call with captured continuation.
Note our implementation provides a rudimentary form of delimited continuations. See Oleg Kiselyov: Undelimited continuations are co-values rather than functions. Delimited continuations return a value and can be composed, so they at least resemble functions (even though are not, strictly speaking, actually functions), whereas undelimited continuations do not even return. For two different debunkings of the continuations-are-functions myth, approaching the problem from completely different angles, see the above post by Oleg Kiselyov, and John Shutt: Continuations and term-rewriting calculi.
Racket provides a thought-out implementation of delimited continuations and prompts to control them.
As for a function call in call_cc[...]
vs. just a function reference: Typical lispy usage of call/cc
uses an inline lambda, with the closure property passing in everything except cc
, but in Python def
is a statement. A technically possible alternative syntax would be:
with call_cc(f): # this syntax not supported!
def f(cc):
...
but the expr macro variant provides better options for receiving multiple return values, and perhaps remains closer to standard Python.
The call_cc[]
explicitly suggests that these are (almost) the only places where the cc
argument obtains a non-default value. It also visually indicates the exact position of the checkpoint, while keeping to standard Python syntax.
(Almost: As explained above, a tail call passes along the current value of cc
, and cc
can be set manually.)
Write Python almost like Lisp!
Lexically inside a with prefix
block, any literal tuple denotes a function call, unless quoted. The first element is the operator, the rest are arguments. Bindings of the let
macros and the top-level tuple in a do[]
are left alone, but prefix
recurses inside them (in the case of let-bindings, on each RHS).
The rest is best explained by example:
from unpythonic.syntax import macros, prefix, q, u, kw
from unpythonic import apply
with prefix:
(print, "hello world")
# quote operator q locally turns off the function-call transformation:
t1 = (q, 1, 2, (3, 4), 5) # q takes effect recursively
x = 42
t2 = (q, 17, 23, x) # unlike in Lisps, x refers to its value even in a quote
(print, t1, t2)
# unquote operator u locally turns the transformation back on:
t3 = (q, (u, print, 42), (print, 42), "foo", "bar")
assert t3 == (q, None, (print, 42), "foo", "bar")
# quotes nest; call transformation made when quote level == 0
t4 = (q, (print, 42), (q, (u, u, print, 42)), "foo", "bar")
assert t4 == (q, (print, 42), (None,), "foo", "bar")
# Be careful:
try:
(x,) # in a prefix block, this means "call the 0-arg function x"
except TypeError:
pass # 'int' object is not callable
(q, x) # OK!
# give named args with kw(...) [it's syntax, not really a function!]:
def f(*, a, b):
return (q, a, b)
# in one kw(...), or...
assert (f, kw(a="hi there", b="Tom")) == (q, "hi there", "Tom")
# in several kw(...), doesn't matter
assert (f, kw(a="hi there"), kw(b="Tom")) == (q, "hi there", "Tom")
# in case of duplicate name across kws, rightmost wins
assert (f, kw(a="hi there"), kw(b="Tom"), kw(b="Jerry")) == (q, "hi there", "Jerry")
# give *args with unpythonic.apply, like in Lisps:
lst = [1, 2, 3]
def g(*args):
return args
assert (apply, g, lst) == (q, 1, 2, 3)
# lst goes last; may have other args first
assert (apply, g, "hi", "ho", lst) == (q, "hi" ,"ho", 1, 2, 3)
If you use the q
, u
and kw()
operators, they must be macro-imported. The q
, u
and kw()
operators may only appear in a tuple inside a prefix block. In any invalid position, any of them is considered a syntax error at macro expansion time.
The prefix
macro comboes with autocurry
for an authentic Listhell programming experience:
from unpythonic.syntax import macros, autocurry, prefix, q, u, kw
from unpythonic import foldr, composerc as compose, cons, nil
with prefix, autocurry: # important: apply prefix first, then autocurry
mymap = lambda f: (foldr, (compose, cons, f), nil)
double = lambda x: 2 * x
(print, (mymap, double, (q, 1, 2, 3)))
assert (mymap, double, (q, 1, 2, 3)) == ll(2, 4, 6)
See also the Listhell dialect, which pre-packages that combo.
CAUTION: The prefix
macro is experimental and not intended for use in production code.
Changed in v0.15.0. If the item in tail position is a function definition or class definition, return the thing that was defined. This functionality being missing in earlier versions was an oversight.
In Lisps, a function implicitly returns the value of the expression in tail position along the code path being executed. That is, "the last value" is automatically returned when the function terminates normally. No return
keyword is needed.
Python's lambda
also already behaves like this; the whole body is just one expression, whose value will be returned.
However, def
requires a return
, even in tail position. Enter the autoreturn
macro:
from unpythonic.syntax import macros, autoreturn
with autoreturn:
def f():
...
"I'll just return this"
assert f() == "I'll just return this"
def g(x):
...
if x == 1:
"one"
elif x == 2:
"two"
else:
"something else"
assert g(1) == "one"
assert g(2) == "two"
assert g(42) == "something else"
Each def
or async def
function definition lexically within the with autoreturn
block is examined.
Any explicit return
statements are left alone, so return
can still be used as usual. This is especially useful if you want to return early (before execution reaches the tail position).
To find and transform the statement(s) in tail position, we look at the last statement within the function definition. If it is:
-
An expression
expr
, it is transformed intoreturn expr
. -
A function or class definition, a return statement is appended to return that function/class. Added in v0.15.0.
-
An
if
/elif
/else
block, the transformation is applied recursively to the last item in each of its branches.- CAUTION: If the final
else
of anif
/elif
/else
is omitted, as often in Python, then only theelse
item is in tail position with respect to the function definition - likely not what you want. So withautoreturn
, the finalelse
should be written out explicitly, to include theelse
branch into theif
/elif
/else
statement.
- CAUTION: If the final
-
A
with
orasync with
block, the transformation is applied recursively to the last item in its body. -
A
try
/except
/else
/finally
block:- If an
else
clause is present, the transformation is applied recursively to the last item in it; otherwise, to the last item in thetry
clause. These are the positions that indicate a normal return (i.e. no exception was raised). - In both cases, the transformation is applied recursively to the last item in each of the
except
clauses. - The
finally
clause is not transformed; it is intended as a finalizer (e.g. to release resources) that runs after the interesting value is already being returned bytry
,else
orexcept
.
- If an
CAUTION: for
, async for
, while
are currently not analyzed; effectively, these are defined as always returning None
. If the last item in your function body is a loop, use an explicit return.
CAUTION: With autoreturn
enabled, functions no longer return None
by default; the whole point of this macro is to change the default return value. The default return value becomes None
only if the tail position contains a statement other than if
, with
, async with
or try
.
If you wish to omit return
in tail calls, autoreturn
comboes with tco
. For the correct invocation order, see the xmas tree combo.
For code using conditions and restarts: there is no special integration between autoreturn
and the conditions-and-restarts subsystem of unpythonic
. However, these should work together, because:
- The
with restarts
form is just awith
block, so it gets theautoreturn
treatment. - The handlers in a
with handlers
form are either separately defined functions, or lambdas.- Lambdas need no
autoreturn
. - If you
def
the handler functions in awith autoreturn
block (either the same one or a different one; this does not matter), they will get theautoreturn
treatment.
- Lambdas need no
- The
with handlers
form itself is justwith
block, so it also gets theautoreturn
treatment.
Changed in v0.15.3. Env-assignment now uses the assignment expression syntax x := range(3)
. The old syntax x << range(3)
is still supported for backward compatibility.
This is essentially a macro implementation of Haskell's do-notation for Python, specialized to the List monad.
The forall[]
expr macro behaves the same as the multiple-body-expression tuple comprehension unpythonic.forall
, but the macro is implemented purely by AST transformation, using real lexical variables.
The implementation is generic and very short; if interested, see the module unpythonic.syntax.forall
. Compare the module unpythonic.amb
, which implements the same functionality with a source code generator and eval
, without macros. The macro implementation is both shorter and more readable; this is effectively a textbook example of a situation where macros are the clean solution.
from unpythonic.syntax import macros, forall
from unpythonic.syntax import insist, deny # regular functions, not macros
out = forall[y := range(3),
x := range(3),
insist(x % 2 == 0),
(x, y)]
assert out == ((0, 0), (2, 0), (0, 1), (2, 1), (0, 2), (2, 2))
# pythagorean triples
pt = forall[z := range(1, 21), # hypotenuse
x := range(1, z+1), # shorter leg
y := range(x, z+1), # longer leg
insist(x*x + y*y == z*z),
(x, y, z)]
assert tuple(sorted(pt)) == ((3, 4, 5), (5, 12, 13), (6, 8, 10),
(8, 15, 17), (9, 12, 15), (12, 16, 20))
Assignment, with List-monadic magic, is var := iterable
. It is only valid at the top level of the forall
(e.g. not inside any possibly nested let
).
insist
and deny
are not macros; they are just the functions from unpythonic.amb
, re-exported for convenience.
The error raised by an undefined name in a forall[]
section is NameError
.
Small macros that are not essential but make some things easier or simpler.
With cond
, lambdas too can have multi-branch conditionals, yet remain human-readable:
from unpythonic.syntax import macros, cond
answer = lambda x: cond[x == 2, "two",
x == 3, "three",
"something else"]
print(answer(42))
Syntax is cond[test1, then1, test2, then2, ..., otherwise]
. A missing otherwise
branch is considered a syntax error at macro expansion time.
Any part of cond
may have multiple expressions by surrounding it with brackets:
cond[[pre1, ..., test1], [post1, ..., then1],
[pre2, ..., test2], [post2, ..., then2],
...
[postn, ..., otherwise]]
This is just the extra bracket syntax that denotes an implicit do[]
. To denote a single expression that is a literal list, double the brackets: [[1, 2, 3]]
. Just like in a let[]
form, the outer brackets enable multiple-expression mode, and then the inner brackets denote a list. The multiple-expression mode is allowed also when there is just one expression.
Inspired by the cond
form of many Lisps. There is some variation between Lisp dialects on whether cond
or if
is preferable if the dialect provides both. For example, in Racket, cond
is the preferred construct for writing conditionals.
Changed in v0.15.0. The it
helper macro may only appear in the then
and otherwise
branches of an aif[]
. Anywhere else, it is considered a syntax error at macro expansion time.
In linguistics, an anaphor is an expression that refers to another, such as the English word "it". Anaphoric macros are a lispy take on the concept. An anaphoric macro may, for example, implicitly define an it
that the user code can then use, with the meaning defined by the macro. This is sometimes a useful technique to shorten code, but it can also make code unreadable by hiding definitions, so it should be used sparingly.
Particularly, the anaphoric if is a classic macro, where it
is automatically bound to the result of the test. We provide that macro as aif[]
.
Concerning readability, the anaphoric if is relatively harmless, because it is almost obvious from context that the only it
that makes sense for a human to refer to is the test expression.
from unpythonic.syntax import macros, aif, it
aif[2 * 21,
print(f"it is {it}"),
print("it is falsey")]
Syntax is aif[test, then, otherwise]
. The magic identifier it
(which must be imported as a macro) refers to the test result while (lexically) inside the then
and otherwise
branches of an aif[]
, and anywhere else is considered a syntax error at macro expansion time.
Any part of aif
may have multiple expressions by surrounding it with brackets:
aif[[pre, ..., test],
[post_true, ..., then], # "then" branch
[post_false, ..., otherwise]] # "otherwise" branch
This is just the extra bracket syntax that denotes an implicit do[]
. To denote a single expression that is a literal list, double the brackets: [[1, 2, 3]]
. Just like in a let[]
form, the outer brackets enable multiple-expression mode, and then the inner brackets denote a list. The multiple-expression mode is allowed also when there is just one expression.
If interested, compare with a Racket implementation; aif
is probably the simplest macro that relies on either the lack of macro hygiene or intentional breaking thereof.
CAUTION: This is a really, really bad idea that comes with serious readability and security implications. Python does not provide this construct itself, for good reason. Details below. Use with care, if at all.
Ever wish you could with(obj)
to say x
instead of obj.x
to read attributes of an object? Enter the autoref
block macro:
from unpythonic.syntax import macros, autoref
from unpythonic import env
e = env(a=1, b=2)
c = 3
with autoref(e):
assert a == 1 # a --> e.a
assert b == 2 # b --> e.b
assert c == 3 # no c in e, so just c
The transformation is applied for names in Load
context only, including names found inside Attribute
or Subscript
AST nodes, so things like a[1]
and a.x
are also valid (looking up a
in e
).
Names in Store
or Del
context are not redirected. To write to or delete attributes of o
, explicitly refer to o.x
, as usual.
Nested autoref blocks are allowed (lookups are lexically scoped).
Reading with autoref
can be convenient e.g. for data returned by SciPy's .mat
file loader.
See the unit tests for more usage examples.
This is similar to the JavaScript with
construct, which is nowadays deprecated. See also the ES6 reference on with
.
NOTE: The JavaScript with
and the Python with
have nothing in common except the name.
CAUTION: The with
construct of JavaScript was deprecated for security reasons. Since the autoref'd object will hijack all name lookups, use with autoref
only with an object you trust! In most Python code, this does not matter, as we are all adults here, but this may matter if a Python object arrives from an untrusted source in a networked app.
CAUTION: with autoref
complicates static code analysis or makes it outright infeasible. It is impossible to statically know whether something that looks like a bare name in the source code is actually a true bare name, or a reference to an attribute of the autoref'd object. That status can also change at any time, since the lookup is dynamic, and attributes can be added and removed dynamically.
Added in v0.14.3.
We provide a lightweight, mcpyrate
-based test framework for testing macro-enabled Python:
from unpythonic.syntax import macros, test, test_raises, fail, error, warn, the
from unpythonic.test.fixtures import session, testset, terminate, returns_normally
def f():
raise RuntimeError("argh!")
def g(a, b):
return a * b
fail["this line should be unreachable"]
count = 0
def counter():
global count
count += 1
return count
with session("simple framework demo"):
with testset():
test[2 + 2 == 4]
test_raises[RuntimeError, f()]
test[returns_normally(g(2, 3))]
test[g(2, 3) == 6]
# Use `the[]` (or several) in a `test[]` to declare what you want to inspect if the test fails.
test[counter() < the[counter()]]
with testset("outer"):
with testset("inner 1"):
test[g(6, 7) == 42]
with testset("inner 2"):
test[None is None]
with testset("inner 3"): # an empty testset is considered 100% passed.
pass
with testset("inner 4"):
warn["This testset not implemented yet"]
with testset("integration"):
try:
import blargly
except ImportError:
error["blargly not installed, cannot test integration with it."]
else:
... # blargly integration tests go here
with testset(postproc=terminate):
test[2 * 2 == 5] # fails, terminating the nearest dynamically enclosing `with session`
test[2 * 2 == 4] # not reached
By default, running this script through the macropython
wrapper (from mcpyrate
) will produce an ANSI-colored test report in the terminal. To actually see how the output looks like, and for actual runnable examples, see unpythonic
's own automated tests.
If you want to turn coloring off (e.g. for the purposes of redirecting stderr to a file), see the TestConfig
bunch of constants in unpythonic.test.fixtures
.
The following is an overview of the framework. For details, look at the docstrings of the various constructs in unpythonic.test.fixtures
(which provides much of this), those of the testing macros, and finally, the automated tests of unpythonic
itself. Tests can be found in subfolders named tests
: regular code, macros, dialects.
Examples of how to test code using conditions and restarts can be found in unpythonic.tests.test_conditions
.
Examples of how to test macro utilities (e.g. syntax transformer functions that operate on ASTs) can be found in unpythonic.syntax.tests.test_letdoutil
.
NOTE: If you want to compartmentalize macro expansion in your tests (so that an error during macro expansion will not crash your test unit), mcpyrate
offers more than one way to invoke the macro expander at run time (of your test unit), depending on what exactly you want to do. One is the mcpyrate.metatools.expand
family of macros, and another are the functions in the module mcpyrate.compiler
. See the mcpyrate
user manual: specifically on metatools
(and quasiquoting) and on compiler
. The tests of mcpyrate
itself provide some examples on how to use compiler
.
All testing macros are provided in the module unpythonic.syntax
. All regular functions related to testing are provided in the module unpythonic.test.fixtures
.
We provide the low-level syntactic constructs test[]
, test_raises[]
and test_signals[]
, with the usual meanings. The last one is for testing code that uses unpythonic.signal
and its sisters (related to conditions and restarts à la Common Lisp); see the module unpythonic.conditions
, and the user manual section on conditions and restarts.
By default, the test[expr]
macro asserts that the value of expr
is truthy. If you want to assert only that expr
runs to completion normally, use test[returns_normally(expr)]
. Here returns_normally
is a regular function, which is available in the module unpythonic.test.fixtures
.
All three testing constructs also come in block variants, with test
, with test_raises[exctype]
, with test_signals[exctype]
.
As usual in test frameworks, the testing constructs behave somewhat like assert
, with the difference that a failure or error will not abort the whole unit, unless explicitly asked to do so. There is no return value; upon success, the testing constructs return None
. Upon failure (test assertion not satisfied) or error (unexpected exception or signal), the failure or error is reported, and further tests continue running.
All the variants of the testing constructs catch any uncaught exceptions and signals from inside the test expression or block. Any unexpected uncaught exception or signal is considered an error.
Because unpythonic.test.fixtures
is, by design, a minimalistic no-framework (cf. "NoSQL"), it is up to you to define - in your custom test runner - whether having any failures, errors or warnings should lead to the whole test suite failing. Whether the program's exit code is zero, is important e.g. for GitHub's CI workflows.
For example, in unpythonic
's own tests, warnings do not cause the test suite to fail, but errors and failures do. The very short runtests.py
(just under 60 SLOC) is a complete test runner using unpythonic.test.fixtures
.
Imports - complete list:
from unpythonic.syntax import (macros, test, test_raises, test_signals,
fail, error, warn, the, expand_testing_macros_first)
from unpythonic.test.fixtures import (session, testset, returns_normally,
catch_signals, terminate)
Overall structure of typical unit test module:
from unpythonic.syntax import macros, test, test_raises, the
from unpythonic.test.fixtures import session, testset
def runtests():
with testset("something 1"):
test[...]
test_raises[TypeError, ...]
test_raises[ValueError, ...]
...
with testset("something 2"):
...
...
if __name__ == '__main__': # pragma: no cover
with session(__file__):
runtests()
The if-main idiom allows running this test module individually, but it is tagged with # pragma: no cover
, so that the coverage reporter will not yell about it when the module is run by the test runner as part of the complete test suite (which, incidentally, is also a good opportunity to measure coverage).
If you want to ensure that testing macros expand before anything else - including your own code-walking block macros (when you have tests inside the body of a with
block that invokes a code-walking block macro) - import the macro expand_testing_macros_first
, and put a with expand_testing_macros_first
around the affected code. (See Expansion order, below.)
Sessions and testsets:
with session(name):
with testset(name):
...
with testset(name):
...
with testset(name): # nested testset
...
with testset(name):
with catch_signals(False):
...
Each name
above is human-readable and optional. The purpose of the naming feature is to improve scannability of the testing report, and of the unit test source code, for the human reader.
Note that even if name
is omitted, the parentheses are still mandatory, because session
and testset
are just garden variety context managers that must be instantiated in order for them to perform their jobs.
A session implicitly introduces a top-level testset, for convenience - so if you only a have a few tests and don't want to group them, you do not need to use with testset
at all.
Testsets can be nested arbitrarily deep.
The function terminate
, when called, exits the test session immediately. Usually it is not needed, but it is provided for convenience.
Additional tools for code using conditions and restarts:
The catch_signals
context manager controls the signal barrier of with testset
and the test
family of syntactic constructs. It is provided for writing tests for code that uses conditions and restarts.
Used as with catch_signals(False)
, it disables the signal barrier for the dynamic extent of the block. When the barrier is disabled, an uncaught signal (in the sense of unpythonic.signal
and its sisters) is not considered as an error. This can be useful, because sometimes leaving a signal uncaught is the right thing to do. See unpythonic.tests.test_conditions
for examples.
The with catch_signals
construct can be nested. Used as with catch_signals(True)
, it re-enables the barrier, if currently disabled, for the dynamic extent of that inner with catch_signals
block.
When a with catch_signals
block exits, the previous state of the signal barrier is automatically restored.
Expression forms - complete list:
test[expr]
test[expr, message]
test[returns_normally(expr)]
test[returns_normally(expr), message]
test_raises[exctype, expr]
test_raises[exctype, expr, message]
test_signals[exctype, expr]
test_signals[exctype, expr, message]
fail[message]
error[message]
warn[message]
Inside a test[]
, the helper macro the[]
is available to mark one or more interesting subexpressions inside expr
, for failure and error reporting. An expr
may contain an arbitrary number of the[]
. By default, if expr
is a comparison, the leftmost term is implicitly marked (so that e.g. test[x < 3]
will automatically report the value of x
if the test fails); otherwise nothing. The default is only used when there is no explicit the[]
inside expr
.
The constructs test_raises
, test_signals
, fail
, error
and warn
do not support the[]
.
Tests can be nested; this is sometimes useful as an explicit signal barrier.
Note that the testing constructs error[]
and warn[]
, which are macros, have nothing to do with the functions with the same name in the module unpythonic.conditions
. The macros are part of the test framework; the functions with the same name are signaling protocols of the conditions and restarts system. Following the usual naming conventions separately in both systems, this naming conflict is unfortunately what we get.
Block forms - complete list:
with test:
body
...
# no `return`; assert just that the block completes normally
with test:
body
...
return expr # assert that `expr` is truthy
with test[message]:
body
...
with test[message]:
body
...
return expr
with test_raises[exctype]:
body
...
with test_raises[exctype, message]:
body
...
with test_signals[exctype]:
body
...
with test_signals[exctype, message]:
body
...
In with test
, the the[]
helper macro is available. It can be used to mark any number of expressions and/or subexpressions in the block body.
The constructs with test_raises
, with test_signals
do not support the[]
.
Tests can be nested; this is sometimes useful as an explicit signal barrier.
Changed in v0.15.0. The testing macros now expand outside-in; this allows mcpyrate.debug.step_expansion
to treat them as a separate step. In v0.14.3, which introduced the test framework, they used to be two-pass macros.
Your test macro invocations may get partially expanded code, if those invocations reside in the body of an invocation of a block macro that also expands outside-in:
with yourblockmacro: # outside-in
test[...]
Here the ...
may be edited by yourblockmacro
before test[]
sees it. (It likely will be edited, since this pattern will commonly appear in the tests for yourblockmacro
, where the whole point is to have the ...
depend on what yourblockmacro
outputs.)
If you need testing macros to expand before anything else even in this scenario (so you can more clearly see where in the unexpanded source code a particular expression in a failing/erroring test came from), you can do this:
from unpythonic.syntax import macros, expand_testing_macros_first
with expand_testing_macros_first:
with yourblockmacro:
test[...]
The expand_testing_macros_first
macro is itself a code-walking block macro that does as it says on the tin. The testing macros are identified by scanning the bindings of the current macro expander; names do not matter, so it respects as-imports.
This does imply that yourblockmacro
will then receive the expanded form of test[...]
as input, but that's macros for you. You will have to choose which is more important: seeing the unexpanded code in error messages, or receiving unexpanded test[]
expressions in yourblockmacro
.
Test blocks are meant for testing code that requires Python statements; i.e. does not fit into Python's expression sublanguage.
In unpythonic.test.fixtures
, a test block is implicitly lifted into a function. Hence, any local variables assigned to inside the block remain local to the implicit function. Use Python's nonlocal
and global
keywords, if needed.
By default, a with test
block asserts just that it completes normally. If you instead want to assert that an expression is truthy, use return expr
to terminate the implicit function and return the value of the desired expr
. The return value is passed to the test asserter for checking that it is truthy.
Another way to view the default behavior is that the with test
macro injects a return True
at the end of the block to terminate the implicit function, if there is no explicit return
. This is actually how the default behavior is implemented.
The with test_raises[exctype]
and with test_signals[exctype]
blocks assert that the block raises (respectively, signals) the declared exception type. These blocks are implicitly lifted into functions, too, but they do not check the return value. For them, not raising/signaling the declared exception type is considered a test failure. Raising/signaling some other (hence unexpected) exception type is considered an error.
The point of unpythonic.test.fixtures
is to make testing macro-enabled Python as frictionless as reasonably possible. Thus we provide this convenience feature.
Inside a test[]
expression, or anywhere within the code in a with test
block, the the[]
macro can be used to declare any number of subexpressions as interesting, for capturing the source code and value into the test failure message, which is shown if the test fails. Each the[]
captures one subexpression (as many times as it is evaluated, in the order evaluated).
Because test macros expand outside-in, the source code is captured before any nested inside-out macros expand. (Many macros defined by unpythonic
expand inside-out.) The value is captured at run time as a side effect just after the value has been evaluated.
By default (if no explicit the[]
is present), test[]
implicitly inserts a the[]
for the leftmost term if the top-level expression is a comparison (common use case), and otherwise does not capture anything.
When nothing is captured, if the test fails, the value of the whole expression is shown. Of course, you will then already know the value is falsey, but there is still the possibly useful distinction of whether it is, say, False
, None
, 0
or []
.
A test[]
or with test
can have any number of subexpressions marked as the[]
. It is possible to even nest a the[]
inside another the[]
, if you need the value of some subexpression as well as one of its subexpressions. The captured values are gathered, in the order they were evaluated (by Python's standard evaluation rules), into a list that is shown upon test failure.
Following Python's standard evaluation rules implies that short-circuiting expressions such as test[all(pred(the[x]) for x in iterable)]
may actually capture fewer x
than iterable
has, because all
will terminate evaluation after the first failing term (i.e. the first one for which pred(x)
is falsey). Similarly, expressions involving the logical and
and or
operators (which in Python are short-circuiting) might not evaluate all of the terms before the test is known to fail (and evaluation terminates).
In case of nested test[]
or nested with test
, each the[...]
is understood as belonging to the lexically innermost surrounding test.
The the[]
mechanism is smart enough to skip reporting trivialities for literals, such as (1, 2, 3) = (1, 2, 3)
in test[4 in the[(1, 2, 3)]]
, or 4 = 4
in test[4 in (1, 2, 3)]
. In the second case, note the implicit the[]
on the LHS, because in
is a comparison operator.
If nothing but such trivialities were captured, the failure message will instead report the value of the whole expression. The captures still remain inspectable in the exception instance.
To make testing/debugging macro code more convenient, the the[]
mechanism automatically unparses an AST value into its source code representation for display in the test failure message. This is meant for debugging macro utilities, to which a test case hands some quoted code (i.e. code lifted into its AST representation using mcpyrate's q[]
macro). See unpythonic.syntax.tests.test_letdoutil
for some examples. Note the unparsing is done for display only; the raw value remains inspectable in the exception instance.
CAUTION: The source code is back-converted from the AST representation; hence its surface syntax may look slightly different to the original (e.g. extra parentheses). See mcpyrate.unparse
.
CAUTION: The name of the the[]
construct was inspired by Common Lisp, but that is where the similarities end. The THE
construct of Common Lisp is a return-type declaration (pythonistas would say return-type annotation), meant as a hint for the compiler to produce performance-optimized compiled code. See chapter 32 in Practical Common Lisp by Peter Seibel. In contrast, our the[]
captures a value for test reporting. The only common factors are the name, and that neither construct changes the semantics of the marked code, much. In unpythonic.test.fixtures
, the reason behind picking this name was that it does not change the flow of the source code as English that much, specifically to suggest, between the lines, that it does not change the semantics much. The reasoning behind CL's THE
may be similar, but I have not researched its etymology.
The with session()
in the example session above is optional. The human-readable session name is also optional, used for display purposes only. The session serves two roles: it provides an exit point for terminate
, and defines an implicit top-level testset
.
Tests can optionally be grouped into testsets. Each testset
tallies passed, failed and errored tests within it, and displays the totals when the context exits. Testsets can be named and nested.
It is useful to have at least one testset
(the implicit top-level one established by with session
is fine), because the testset
mechanism forms fully one half of the test framework. It is technically possible to use the testing macros without a testset
, but that is only intended for building alternative test frameworks.
Testsets also provide an option to locally install a postproc
handler that gets a copy of each failure or error in that testset (and by default, any of its inner testsets), after the failure or error has been printed. In nested testsets, the dynamically innermost postproc
wins. A failure is an instance of unpythonic.test.fixtures.TestFailure
, an error is an instance of unpythonic.test.fixtures.TestError
, and a warning is an instance of unpythonic.test.fixtures.TestWarning
. All three inherit from unpythonic.test.fixtures.TestingException
. Beside the human-readable message, these exception types contain attributes with programmatically inspectable information about what happened.
If you want to set a default global postproc
, which is used when no local postproc
is in effect, this is configured in the TestConfig
bunch of constants in unpythonic.test.fixtures
.
The with testset
construct comes with one other important feature. The nearest dynamically enclosing with testset
catches any stray exceptions or signals that occur within its dynamic extent, but outside a test construct.
In case of an uncaught signal, the error is reported, and the testset resumes.
In case of an uncaught exception, the error is reported, and the testset terminates, because the exception model does not support resuming.
Catching of uncaught signals, in both the low-level test
constructs and the high-level testset
, can be disabled using with catch_signals(False)
. This is useful in testing code that uses conditions and restarts; sometimes allowing a signal (e.g. from unpythonic.warn
in the conditions-and-restarts system) to remain uncaught is the right thing to do.
The helper macros fail[message]
, error[message]
and warn[message]
unconditionally produce a test failure, a test error, or a testing warning, respectively. This can be useful:
fail[...]
if that expression should be unreachable when the code being tested works properly.error[...]
if some part of your tests is unable to run.warn[...]
if some tests are temporarily disabled and need future attention, e.g. for syntactic compatibility to make the code run for now on an old Python version.
Currently (v0.14.3), warnings produced by warn[]
are not counted in the total number of tests run. But you can still get the warning count from the separate counter unpythonic.test.fixtures.tests_warned
(see unpythonic.box
; basically you can b.get()
or unbox(b)
to read the value currently inside a box).
If unpythonic.test.fixtures
does not fit your needs and you want to experiment with creating your own framework, the test asserter macros are reusable. Their implementations can be found in unpythonic.syntax.testingtools
. They refer to a few objects in unpythonic.test.fixtures
; consider these a common ground that is not strictly part of the surrounding framework.
Start by reading the docstring of the test
macro, which documents some low-level details.
Set up a condition handler to intercept test failures and errors. These will be signaled via cerror
, using the conditions and restarts mechanism. See the module unpythonic.conditions
. Report the failure/error in any way you desire, and then invoke the proceed
restart (from your condition handler) to let testing continue.
Look at the implementation of testset
as an example.
Because unpythonic
is effectively a language extension, the standard options were not applicable.
The standard library's unittest
fails with unpythonic
due to technical reasons related to unpythonic
's unfortunate choice of module names. The unittest
framework crashes if a module in a library exports anything that has the same name as the module itself, and the library's top-level init then from
-imports that construct into its namespace, causing the module reference, that was implicitly brought in by the from
-import itself, to be overwritten with what was explicitly imported: a reference to the construct that has the same name as the module. This is bad naming on my part, yes, but as of v0.15.0, I see no reason to cross that particular bridge yet.
Also, in my opinion, unittest
is overly verbose to use; automated tests are already a particularly verbose kind of program, even if the testing syntax is minimal. Eliminating extra verbosity encourages writing more tests.
Pytest, on the other hand, provides compact syntax by hijacking the assert statement, but its import hook (to provide that syntax) cannot coexist with a macro expander, which also needs to install a (different) import hook. Pytest is also fairly complex.
The central functional requirement for whatever would be used for testing unpythonic
was to be able to easily deal with macro-enabled Python. No hoops to jump through, compared to testing regular Python, in order to be able to test all of unpythonic
(including unpythonic.syntax
) in a uniform way.
Also, if I was going to build my own framework, it would be nice for it to work seamlessly with code that uses conditions and restarts - since those are part of unpythonic
, but not standard Python.
Simple and minimalistic would be a bonus. As of v0.15.0, the whole test framework is about 1.8k SLOC, counting docstrings, comments and blanks; under 700 SLOC if counting only active code lines. Add another 1k SLOC (all) / 200 SLOC (active code lines) for the machinery that implements conditions and restarts.
The framework will likely still evolve a bit as I find more holes in the UX - which so far has led to features such as the[]
and AST value auto-unparsing - but most of the desired functionality is already present and working fine. For example, I consider pytest-style implicit fixtures and a central test discovery system as outside the scope of this framework. It does make the code shorter, but is perhaps slightly too much magic.
It is clear that unpythonic.test.fixtures
is not going to replace pytest
, nor does it aim to do so - any more than Chuck Moore's Forth-based VLSI tools were intended to replace the commercial VLSI offerings.
What we have is small, simple, custom-built for its purpose (works well with macro-enabled Python; integrates with conditions and restarts), arguably somewhat pedagogic (demonstrates how to build a test framework in under 700 active SLOC), and importantly, works just fine.
A test fixture is defined as an environment used to consistently test some item, device, or piece of software. In automated tests, it is typically a piece of code that is reused within the test suite of a project, to perform initialization and/or teardown tasks common to several test cases.
A test framework can be reused across many different projects, and the error-catching and reporting code, if anything, is something that is shared across all test cases. Also, following our naming scheme, the framework had to be called unpythonic.test.something
, and fixtures
just happened to fit the theme.
Inspired by Julia's standard-library Test
package, and chapter 9 of Peter Seibel's Practical Common Lisp.
Changed in v0.15.0. We now use the mcpyrate
macro expander instead of macropy
. Updated the REPL note below.
Also, dbgprint_expr
is now a dynvar.
Changed in 0.14.2. The dbg[]
macro now works in the REPL, too. You can use mcpyrate.repl.console
(a.k.a. macropython -i
in the shell) or the IPython extension mcpyrate.repl.iconsole
.
DRY out your qnd debug printing code. Both block and expression variants are provided:
from unpythonic.syntax import macros, dbg
with dbg:
x = 2
print(x) # --> [file.py:5] x: 2
with dbg:
x = 2
y = 3
print(x, y, 17 + 23) # --> [file.py:10] x: 2, y: 3, (17 + 23): 40
print(x, y, 17 + 23, sep="\n") # --> [file.py:11] x: 2
# [file.py:11] y: 3
# [file.py:11] (17 + 23): 40
z = dbg[25 + 17] # --> [file.py:15] (25 + 17): 42
assert z == 42 # surrounding an expression with dbg[...] doesn't alter its value
In the block variant, just like in nb
, a custom print function can be supplied as the first positional argument. This avoids transforming any uses of built-in print
:
prt = lambda *args, **kwargs: print(*args)
with dbg[prt]:
x = 2
prt(x) # --> ('x',) (2,)
print(x) # --> 2
with dbg[prt]:
x = 2
y = 17
prt(x, y, 1 + 2) # --> ('x', 'y', '(1 + 2)'), (2, 17, 3)
The reference to the custom print function (i.e. the argument to the dbg
block) must be a bare name. Support for methods may or may not be added in a future version.
In the expr variant, to customize printing, just assign a function to the dynvar dbgprint_expr
via with dyn.let(dbgprint_expr=...)
. If no custom printer is set, a default implementation is used.
For details on implementing custom debug print functions, see the docstrings of unpythonic.syntax.dbgprint_block
and unpythonic.syntax.dbgprint_expr
, which provide the default implementations.
CAUTION: The source code is back-converted from the AST representation; hence its surface syntax may look slightly different to the original (e.g. extra parentheses). See mcpyrate.unparse
.
Inspired by the dbg macro in Rust.
Stuff that didn't fit elsewhere.
Mix regular code with math-notebook-like code in a .py
file. To enable notebook mode, with nb
:
from unpythonic.syntax import macros, nb
from sympy import symbols, pprint
with nb:
2 + 3
assert _ == 5
_ * 42
assert _ == 210
with nb[pprint]:
x, y = symbols("x, y")
x * y
assert _ == x * y
3 * _
assert _ == 3 * x * y
Expressions at the top level auto-assign the result to _
, and auto-print it if the value is not None
. Only expressions do that; for any statement that is not an expression, _
retains its previous value.
A custom print function can be supplied as the first positional argument to nb
. This is useful with SymPy (and latex-input to use α, β, γ, ... as actual variable names).
Obviously not intended for production use, although is very likely to work anywhere.
Is this just a set of macros, a language extension, or a compiler for a new language that just happens to be implemented in mcpyrate
, à la On Lisp? All of the above, really.
The macros in unpythonic.syntax
are designed to work together, but some care needs to be taken regarding the order in which they expand. This complexity unfortunately comes with any pick-and-mix-your-own-language kit, because some features inevitably interact. For example, it is possible to lazify continuation-enabled code, but running the transformations the other way around produces nonsense.
The correct xmas tree invocation is:
with prefix, autoreturn, quicklambda, multilambda, envify, lazify, namedlambda, autoref, autocurry, tco:
...
Here tco
can be replaced with continuations
, if needed.
We have taken into account that:
- Outside-in:
prefix
,autoreturn
,quicklambda
,multilambda
- Two-pass:
envify
,lazify
,namedlambda
,autoref
,autocurry
,tco
/continuations
The dialect examples use this ordering.
For simplicity, the block macros make no attempt to prevent invalid combos, unless there is a specific technical reason to do that for some particular combination. Be careful; e.g. do not nest several with tco
blocks (lexically), that will not work.
As an example of a specific technical reason, the tco
macro skips already expanded with continuations
blocks lexically contained within the with tco
. This allows the Lispython dialect to support continuations
.
The AST edits performed by the block macros are designed to run in the following order (leftmost first):
prefix > nb > autoreturn, quicklambda > multilambda > continuations or tco > ...
... > autocurry > namedlambda, autoref > lazify > envify
The let_syntax
(and abbrev
) block may be placed anywhere in the chain; just keep in mind what it does.
The dbg
block can be run at any position after prefix
and before tco
(or continuations
). It must be able to see function calls in Python's standard format, for detecting calls to the print function.
The correct ordering for block macro invocations - which is the actual user-facing part - is somewhat complicated by the fact that some of the above are two-pass macros. Consider this artificial example, where mac
is a two-pass macro:
with mac:
with cheese:
...
The invocation with mac
is lexically on the outside, thus the macro expander sees it first. The expansion order then becomes:
- First pass (outside in) of
with mac
. - Explicit recursion by
with mac
. This expands thewith cheese
. - Second pass (inside out) of
with mac
.
So, for example, even though lazify
must perform its AST edits after autocurry
, it happens to be a two-pass macro. The first pass (outside in) only performs some preliminary analysis; the actual lazification happens in the second pass (inside out). So the correct invocation comboing these two is with lazify, autocurry
. Similarly, with lazify, continuations
is correct, even though the CPS transformation must occur first; these are both two-pass macros that perform their edits in the inside-out pass.
Further details on individual block macros can be found in our notes on macros.
Example combo in the single-line format:
with autoreturn, lazify, tco:
...
The same combo in the multiline format:
with autoreturn:
with lazify:
with tco:
...
In MacroPy (which was used up to v0.14.3), there sometimes were differences between the behavior of the single-line and multi-line invocation format, but in mcpyrate
(which is used by v0.15.0 and later), they should behave the same.
With mcpyrate
, there is still a minor difference if there are at least three nested macro invocations, and a macro is scanning the tree for another macro invocation; then the tree looks different depending on whether the single-line or the multi-line format was used. The differences in that are as one would expect knowing how with
statements look like in the Python AST. The reason the difference manifests only for three or more macro invocations is that mcpyrate
pops the macro that is being expanded before it hands over the tree to the macro code; hence if there are only two, the inner tree will have only one "context manager" in its with
.
NOTE to the curious, and to future documentation maintainers: To see if something is a two-pass macro, grep the codebase for expander.visit_recursively
; that is the explicit recursion mentioned above, and means that within that function, anything below that line will run in the inside-out pass. See the mcpyrate
manual.
This Elisp snippet can be used to add syntax highlighting for keywords specific to mcpyrate
and unpythonic.syntax
to your Emacs setup:
(defun my/unpythonic-syntax-highlight-setup ()
"Set up additional syntax highlighting for `unpythonic.syntax' and `mcpyrate` in Python mode."
;; adapted from code in dash.el
(let ((new-keywords '("test" "test_raises" "test_signals" "fail" "the"
"error" "warn" ; both testing macros and condition signaling protocols
"signal" "cerror" "handlers" "restarts" ; not macros, but in a role similar to exception handling constructs in the conditions/restarts system.
"let" "dlet" "blet"
"letseq" "dletseq" "bletseq"
"letrec" "dletrec" "bletrec"
"let_syntax" "abbrev"
"where"
"do" "local" "delete"
"continuations" "call_cc"
"autocurry" "lazify" "envify" "tco" "prefix" "autoreturn" "forall"
"multilambda" "namedlambda" "quicklambda"
"cond" "aif" "autoref" "dbg" "nb"
"macros" "dialects" "q" "u" "n" "a" "s" "t" "h")) ; mcpyrate
(special-variables '("it"
"dyn")))
(font-lock-add-keywords 'python-mode `((,(concat "\\_<" (regexp-opt special-variables 'paren) "\\_>")
1 font-lock-variable-name-face)) 'append)
;; "(\\s-*" maybe somewhere?
(font-lock-add-keywords 'python-mode `((,(concat "\\_<" (regexp-opt new-keywords 'paren) "\\_>")
1 font-lock-keyword-face)) 'append)
))
(add-hook 'python-mode-hook 'my/unpythonic-syntax-highlight-setup)
Known issue: For some reason, during a given session, this takes effect only starting with the second Python file opened. The first Python file opened during a session shows with the default Python syntax highlighting. Probably something to do with the initialization order of font-lock and whichever python-mode
is being used.
Tested with anaconda-mode
.
If you use the Spacemacs kit, the right place to insert the snippet is into the function dotspacemacs/user-config
. Here's my spacemacs.d for reference; the snippet is in prettify-symbols-config.el
, and it is invoked from dotspacemacs/user-config
in init.el
.
In a basic Emacs setup, the snippet goes into the ~/.emacs
startup file, or if you have an .emacs.d/
directory, then into ~/.emacs.d/init.el
.
Strictly speaking, True
. We just repurpose Python's existing syntax to give it new meanings. However, in the Racket reference, a syntax designates a macro, in contrast to a procedure (regular function). We provide syntaxes in this particular sense. The name unpythonic.syntax
is also shorter to type than unpythonic.semantics
, less obscure, and close enough to convey the intended meaning.
If you want custom syntax proper, or want to package a set of block macros as a custom language that extends Python, then you may be interested in our sister project mcpyrate
.