8

Statements and State

All my life, my heart has yearned for a thing I cannot name. André Breton, Mad Love

The interpreter we have so far feels less like programming a real language and more like punching buttons on a calculator. “Programming” to me means building up a system out of smaller pieces. We can’t do that yet because we have no way to bind a name to some data or function. We can’t compose software without a way to refer to the pieces.

To support bindings, our interpreter needs internal state. When you define a variable at the beginning of the program and use it at the end, the interpreter has to hold on to the value of that variable in the meantime. So in this chapter, we will give our interpreter a brain that can not just process, but remember.

A brain, presumably remembering stuff.

State and statements go hand in hand. Since statements, by definition, don’t evaluate to a value, they need to do something else to be useful. That something is called a side effect. It could mean producing user-visible output or modifying some state in the interpreter that can be detected later. The latter makes them a great fit for defining variables or other named entities.

In this chapter, we’ll do all of that. We’ll define statements that produce output (print) and create state (var). We’ll add expressions to access and assign to variables. Finally, we’ll add blocks and local scope. That’s a lot to stuff into one chapter, but we’ll chew through it all one bite at a time.

8 . 1Statements

We start by extending Lox’s grammar with statements. They aren’t very different from expressions. We start with the two simplest kinds:

  1. An expression statement lets you place an expression where a statement is expected. They exist to evaluate expressions that have side effects. You may not notice them, but you use them all the time in C, Python, Python, and other languages. Any time you see a function or method call followed by a ;, you’re looking at an expression statement.

  2. A print statement evaluates an expression and displays the result to the user. I admit it’s weird to bake printing right into the language instead of making it a library function. Doing so is a concession to the fact that we’re building this interpreter one chapter at a time and want to be able to play with it before it’s all done. To make print a library function, we’d have to wait until we had all of the machinery for defining and calling functions before we could witness any side effects.

New syntax means new grammar rules. In this chapter, we finally gain the ability to parse an entire Lox script. Since Lox is an imperative, dynamically typed language, the “top level” of a script is simply a list of statements. The new rules are:

programstatement* EOF ;

statementexprStmt
               | printStmt ;

exprStmtexpression ";" ;
printStmt"print" expression ";" ;

The first rule is now program, which is the starting point for the grammar and represents a complete Lox script or REPL entry. A program is a list of statements followed by the special “end of file” token. The mandatory end token ensures the parser consumes the entire input and doesn’t silently ignore erroneous unconsumed tokens at the end of a script.

Right now, statement only has two cases for the two kinds of statements we’ve described. We’ll fill in more later in this chapter and in the following ones. The next step is turning this grammar into something we can store in memorysyntax trees.

8 . 1 . 1Statement syntax trees

There is no place in the grammar where both an expression and a statement are allowed. The operands of, say, + are always expressions, never statements. The body of a while loop is always a statement.

Since the two syntaxes are disjoint, we don’t need a single base class that they all inherit from. Splitting expressions and statements into separate class hierarchies help us find dumb mistakes like passing a statement to a method that expects an expression (granted you configured your IDE to catch such things).

That means a new base class for statements. As our elders did before us, we will use the cryptic name “Stmt”. Just as before, we will define them using dataclasses.

# lox/ast.py after Expr

class Stmt:
    """Abstract Base Class for statements"""

@dataclass
class Expression(Stmt):
    expression: Expr

@dataclass
class Print(Stmt):
    expression: Expr

We will also create a special Stmt node that represents whole programs:

# lox/ast.py after Stmt
@dataclass
class Program(Stmt):
    statements: list[Stmt]

8 . 1 . 2Parsing statements

The parser’s parse() function that parses and returns a single expression was a temporary hack to get the last chapter up and running. Now that our grammar has the correct starting rule, program, we can turn parse() into the real deal.

# lox/parser.py replace the parse() function
from lox.stmt import Stmt

def parse(tokens: list[Token]) -> Stmt:
    parser = Parser(tokens)
    statements = []
    while not parser.is_at_end():
        statements.append(parser.statement())
    return Program(statements)

This parses a series of statements, as many as it can find until it hits the end of the input. This is a pretty direct translation of the program rule into recursive descent style.

A program is a list of statements, and we parse one of those statements using this method:

# lox/parser.py Parser method
def statement(self) -> Stmt:
    match self.peek().type:
        case "PRINT":
            return self.print_statement()
        case _:
            return self.expression_statement()

A little bare bones, but we’ll fill it in with more statement types later. We determine which specific statement rule is matched by looking at the current token. A print token means it’s obviously a print statement.

If the next token doesn’t look like any known kind of statement, we assume it must be an expression statement. That’s the typical final fallthrough case when parsing a statement, since it’s hard to proactively recognize an expression from its first token.

Each statement kind gets its own method. First print:

# lox/parser.py Parser method
def print_statement(self) -> Print:
    self.consume("PRINT", "Expect 'print' keyword.")
    value = self.expression()
    self.consume("SEMICOLON", "Expect ';' after value.")
    return Print(value)

Since we usually call print_statement() from a within statement(), we could simply consume the first token knowning that it would be a print. Parsing the whole statement without such assumptions, makes things easier to isolate for testing and reuse, albeit with at a very small cost in performance. We parse the subsequent expression, consume the terminating semicolon, and emit the syntax tree.

If we didn’t match a print statement, we must have one of these:

# lox/parser.py Parser method
def expression_statement(self) -> Expression:
    expr = self.expression()
    self.consume("SEMICOLON", "Expect ';' after expression.")
    return Expression(expr)

Similar to the previous method, we parse an expression followed by a semicolon. We wrap that Expr in a Stmt of the right type and return it.

8 . 1 . 3Executing statements

We’re running through the previous couple of chapters in microcosm, working our way through the front end. Our parser can now produce statement syntax trees, so the next and final step is to interpret them. As in expressions, we use singledistpatch, but now we have a slightly different interface: statements do not produce returning values, so we need to adapt our eval-like function to account for this.

Rather than patching eval to handle Stmt arguments, we will create an entirely different funtion called exec. This is consistent with Python’s own nomeclature in which the builtin eval handles Python code that represents expressions and exec handles Python code that represent whole programs and statements.

# lox/interpreter.py
@singledispatch
def exec(stmt: Stmt, env: Env) -> None:
    msg = f"exec not implemented for {type(stmt)}"
    raise TypeError(msg)

We have two statement types, and we need an implementation method for each. The easiest is expression statements.

# lox/interpreter.py after exec()
@exec.register
def _(stmt: Expression, env: Env) -> None:
    eval(stmt.expression, env)

We evaluate the inner expression using our existing eval() method and discard the value.

The print statement’s visit method isn’t much different.

# lox/interpreter.py after exec()
@exec.register
def _(stmt: Print, env: Env) -> None:
    value = eval(stmt.expression, env)
    print(stringify(value))

Before discarding the expression’s value, we convert it to a string using the stringify() function we introduced in the last chapter and then dump it to stdout.

We must provide the implementation of exec(Program) to allow our interpreter to handle whole programs.

# lox/interpreter.py after exec()
@exec.register
def _(stmt: Program, env: Env) -> None:
    for child in stmt.statements:
        exec(child, env)

We also need to adjust the Lox.interpret() method to call exec() instead of eval().

# lox/__main__.py Lox method
# Replace the implementation
def interpret(self, statement: Stmt):
    try:
        exec(statement, self.environment)
    except LoxRuntimeError as error:
        self.report_error(error, code=70)

This require a slight change to the imports:

# lox/__main__.py replace lox.interpreter import
from lox.interpreter import eval, exec, stringify

Basically just plumbing the new syntax through. OK, fire up the interpreter and give it a try. At this point, it’s worth sketching out a little Lox program in a text file to run as a script. Something like:

print "one";
print true;
print 2 + 1;

It almost looks like a real program! Note that the REPL, too, now requires you to enter a full statement instead of a simple expression. Don’t forget your semicolons.

8 . 2Global Variables

Now that we have statements, we can start working on state. Before we get into all of the complexity of lexical scoping, we’ll start off with the easiest kind of variablesglobals. We need two new constructs.

  1. A variable declaration statement brings a new variable into the world.

    var beverage = "espresso";
    

    This creates a new binding that associates a name (here “beverage”) with a value (here, the string "espresso").

  2. Once that’s done, a variable expression accesses that binding. When the identifier “beverage” is used as an expression, it looks up the value bound to that name and returns it.

    print beverage; // "espresso".
    

Later, we’ll add assignment and block scope, but that’s enough to get moving.

8 . 2 . 1Variable syntax

As before, we’ll work through the implementation from front to back, starting with the syntax. Variable declarations are statements, but they are different from other statements, and we’re going to split the statement grammar in two to handle them. That’s because the grammar restricts where some kinds of statements are allowed.

The clauses in control flow statementsthink the then and else branches of an if statement or the body of a whileare each a single statement. But that statement is not allowed to be one that declares a name. This is OK:

if (monday) print "Ugh, already?";

But this is not:

if (monday) var beverage = "espresso";

We could allow the latter, but it’s confusing. What is the scope of that beverage variable? Does it persist after the if statement? If so, what is its value on days other than Monday? Does the variable exist at all on those days?

Code like this is weird, so C, Python, and friends all disallow it. It’s as if there are two levels of “precedence” for statements. Some places where a statement is allowedlike inside a block or at the top levelallow any kind of statement, including declarations. Others allow only the “higher” precedence statements that don’t declare names.

To accommodate the distinction, we add another rule for kinds of statements that declare names.

programdeclaration* EOF ;

declarationvarDecl
               | statement ;

statementexprStmt
               | printStmt ;

Declaration statements go under the new declaration rule. Right now, it’s only variables, but later it will include functions and classes. Any place where a declaration is allowed also allows non-declaring statements, so the declaration rule falls through to statement. Obviously, you can declare stuff at the top level of a script, so program routes to the new rule.

The rule for declaring a variable looks like:

varDecl"var" IDENTIFIER ( "=" expression )? ";" ;

Like most statements, it starts with a leading keyword. In this case, var. Then an identifier token for the name of the variable being declared, followed by an optional initializer expression. Finally, we put a bow on it with the semicolon.

To access a variable, we define a new kind of primary expression.

primary"true" | "false" | "nil"
               | NUMBER | STRING
               | "(" expression ")"
               | IDENTIFIER ;

That IDENTIFIER clause matches a single identifier token, which is understood to be the name of the variable being accessed.

These new grammar rules get their corresponding syntax trees. Over in the AST generator, we add a new statement node for a variable declaration.

# lox/ast.py
@dataclass
class Var(Stmt):
    name: Token
    initializer: Expr

It stores the name token so we know what it’s declaring, along with the initializer expression. (If there isn’t an initializer, that field is null.)

Then we add an expression node for accessing a variable.

# lox/ast.py
@dataclass
class Variable(Expr):
    name: Token

It’s simply a wrapper around the token for the variable name. That’s it.

8 . 2 . 2Parsing variables

Before we parse variable statements, we need to shift around some code to make room for the new declaration rule in the grammar. The top level of a program is now a list of declarations, so the entrypoint method to the parser changes.

Hey, do you remember way back in that earlier chapter when we put the infrastructure in place to do error recovery? We are finally ready to hook that up. We replace the entry point to the parser with this:

# lox/parser.py the while loop in the parse()
    ...
    while not parser.is_at_end():
        try:
            statements.append(parser.declaration())
        except LoxSyntaxError:
            parser.synchronize()
    ...

This declaration() method is the method we call repeatedly when parsing a series of statements in a block or a script, so it’s the right place to synchronize when the parser goes into panic mode. We capture any LoxSyntaxError exceptions and initialize the error recovery. The try/except is used to keep parsing even when we hit an error, and if an error occurs, this gets it back to trying to parse the beginning of the next statement or declaration.

After the parsing is done, we need to announce all errors, if any, to the user. We do this by checking the self.errors list and raising an LoxStaticError when appropriate. This requires an import:

# lox/parser.py at the top
from lox.errors import LoxStaticError

And a small verification just before returning:

# lox/parser.py parse() before the return
    ...
    if parser.errors:
        raise LoxStaticError(parser.errors)
    ...

The real parsing happens inside the declaration() method. First, it looks to see if we’re at a variable declaration by looking for the leading var keyword. If not, it falls through to the existing statement() method that parses print and expression statements.

# lox/parser.py Parser method
def declaration(self) -> Stmt:
    match self.peek().type:
        case "VAR":
            return self.var_declaration()
        case _:
            return self.statement()

Remember how statement() tries to parse an expression statement if no other statement matches? And expression() reports a syntax error if it can’t parse an expression at the current token? That chain of calls ensures we report an error if a valid declaration or statement isn’t parsed.

When the parser matches a var token, it branches to:

# lox/parser.py Parser method
def var_declaration(self) -> Var:
    self.consume("VAR", "Expect 'var' keyword.")
    name = self.consume("IDENTIFIER", "Expect variable name.")

    if self.match("EQUAL"):
        initializer = self.expression()
    else:
        initializer = Literal(None)

    self.consume("SEMICOLON",
                 "Expect ';' after variable declaration.")
    return Var(name, initializer)

As always, the recursive descent code follows the grammar rule. It first consumes the var keyword and then the identifier token for the variable name.

Then, if it sees an = token, it knows there is an initializer expression and parses it. Otherwise, it creates an initializer that represents null. Finally, it consumes the required semicolon at the end of the statement. All this gets wrapped in a Var syntax tree node and we’re groovy.

Parsing a variable expression is even easier. In primary(), we look for an identifier token.

# lox/parser.py Parser.primary if chain
    if self.match("IDENTIFIER"):
        return Variable(self.previous())

That gives us a working front end for declaring and using variables. All that’s left is to feed it into the interpreter. Before we get to that, we need to talk about where variables live in memory.

8 . 3Environments

The bindings that associate variables to values need to be stored somewhere. Ever since the Lisp folks invented parentheses, this data structure has been called an environment.

An environment containing two bindings.

You can think of it like a map where the keys are variable names and the values are the variable’s, uh, values. In fact, that’s how we’ll implement it in Python. We could stuff that map and the code to manage it right into Interpreter, but since it forms a nicely delineated concept, we’ll pull it out into its own class.

Start a new file and add:

# lox/env.py
# Replace the Env declaration
from dataclasses import dataclass, field

@dataclass
class Env[T]:
    values: dict[str, T] = field(default_factory=dict)