Lua Parser in Lua

Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.

Works for versions 5.1 5.2 5.3 5.4 and Luajit. I broke <=5.2 compatability when I resorted to throwing objects for parse error reporting.

AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.

See the tests folder for example usage.


Parser = require 'parser' This will return the parser class.

result, msg = Parser.parse(data[, source, version, useluajit]) This parses the code in data and returns an ast._block object. This is shorthand for Parser(data, source, version, useluajit).tree version is a string '5.3', '5.4', etc., corresponding to your Lua version. The Parser object has a few more functions to it corresponding with internal use while parsing. source is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared. Returns result in case of success. If it encounters a parse error returns false and msg as what went wrong.

ast = require 'parser.lua.ast' This is the AST (abstract syntax tree) library, it hold a collection of AST classes, each representing a different token in the Lua syntax.

n = ast.node() = This is the superclass of all AST classes.

Each has the following properties:

n.type = returns the type of the node, coinciding with the classname in the ast library with underscore removed.

n.span = source code span information (from and to subtables each with source, line and col fields)

n:copy() = returns a copy of the node.

n:flatten(func, varmap) = flattens / inlines the contents of all function call of this function. Used for performance optimizations.

n:toLua() = generate Lua code. same as the node's __tostring.

n:serialize(apply) = apply a to-string serialization function to the AST.

ast.node subclasses:

n = ast._block(...) = a block of code in Lua.
... is a list of initial child stmt nodes to populate the block node with.
n.type == 'block'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._stmt() = a statement-node parent-class.

n = ast._assign(vars, exprs) =
An assignment operation.
Subclass of _stmt.
n.type == 'assign'.
Represents the assignment of n.vars to n.exprs.

n = ast._do(...) =
A do ... end block.
Subclass of _stmt.
n.type == 'do'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._while(cond, ...) =
A while cond do ... end block.
Subclass of _stmt.
n.type == 'while'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._repeat(cond, ...) =
A repeat ... until cond block.
Subclass of _stmt.
n.type == 'repeat'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._if(cond, ...) =
A if cond then ... elseif ... else ... end block.
Subclass of _stmt.
n.type == 'if'.
n.cond holds the condtion expression of the first if statement.
All subsequent arguments must be ast._elseif objects, optionally with a final ast._else object.
n.elseifs holds the ast._elseif objects.
n.elsestmt optionally holds the final ast._else.

n = ast._elseif(cond, ...) =
A elseif cond then ... block.
Subclass of _stmt.
n.type == 'elseif'.
n.cond holds the condition expression of the else statement.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._else(...) =
A else ... block.
n.type == 'else'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._foreq(var, min, max, step, ...) =
A for var=min,max[,step] do ... end block.
Subclass of _stmt.
n.type == 'foreq'.
n.var = the variable node.
n.min = the min expression.
n.max = the max expression.
n.step = the optional step expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end block.
Subclass of _stmt.
n.type == 'forin'.
n.vars = table of variables of the for-in loop.
n.iterexprs = table of iterator expressions of the for-in loop.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end block.
Subclass of _stmt.
n.type == 'function'. = the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args = table of arguments. This does get modified: each argument gets assigned an .param = true, and an .index = for which index it is in the argument list.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._local(exprs)
A local ... statement.
Subclass of _stmt.
n.type == 'local'
n.exprs = list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.

n = ast._return(...)
A return ... statement.
Subclass of _stmt.
n.type == 'return'
n.exprs = list of expressions to return.

n = ast._break(...)
A break statement.
Subclass of _stmt.
n.type == 'break'

n = ast._call(func, ...)
A func(...) function-call expression.
n.type == 'call'
n.func = expression of the function to call.
n.args = list argument expressions to pass into the function-call.

n = ast._nil()
A nil literal expression.
n.type == 'nil'.
n.const == true.
n = ast._boolean()
The parent class of the true/false AST nodes.
n = ast._true()
A true boolean literal expression
n.type == 'true'.
n.const == true.
n.value == true.
ast._boolean:isa(n) evaluates to true

n = ast._false()
A false boolean literal expression
n.type == 'true'.
n.const == true.
n.value == false.
ast._boolean:isa(n) evaluates to true

n = ast._number(value)
A numeric literal expression.
n.type == 'number'.
n.value = the numerical value.

n = ast._string(value)
A string literal expression.
n.type == 'string'.
n.value = the string value.

n = ast._vararg()
A vararg ... expression.
n.type == 'vararg'.
For use within function arguments, assignment expressions, function calls, etc.

n = ast._table(...)
A table { ... } expression.
n.type == 'table'.
n[1] ... n[#n] = expressions of the table.
If the expression in n[i] is an ast._assign then an entry is added into the table as key = value. If it is not an ast._assign then it is inserted as a sequenced entry.

n = ast._var(name)
A variable reference expression.
n.type == 'var' = the variable name.

n = ast._par(expr)
A ( ... ) parenthesis expression.
n.type == 'par'.
n.expr = the expression within the parenthesis.

n = ast._index(expr, key)
An expr[key] expression, i.e. an __index-metatable operation.
n.type == 'index'.
n.expr = the expression to be indexed.
n.key = the expression of the index key.

n = ast._indexself(expr, key)
An expr:key expression, to be used as the expression of a ast._ call node for member-function-calls. These are Lua's shorthand insertion of self as the first argument.
n.type == 'indexself'.
n.expr = the expression to be indexed.
n.key = the key to index. Must only be a Lua string, (not an ast._ string, but a real Lua string).

Binary operations:

node type Lua operator
_add +
_sub -
_mul *
_div /
_mod %
_concat ..
_lt <
_le <=
_gt >
_ge >=
_eq ==
_ne ~=
_and and
_or or
_idiv // 5.3+
_band & 5.3+
_bxor ~ 5.3+
_bor | 5.3+
_shl << 5.3+
_shr >> 5.3+

n[1] ... n[#n] = a table of the arguments of the operation.

Unary operations:

node type Lua operator
_unm -
_not not
_len #
_bnot ~ 5.3+

n[1] = the single argument of the operation.

more extra functions:

Some more useful functions in AST:

  • ast.copy(node) = equivalent of node:copy()
  • ast.flatten(node, func, varmap) = equivalent of node:flatten(func, varmap)
  • ast.refreshparents
  • ast.traverse
  • ast.nodeclass(type, parent, args)
  • ast.tostringmethod = this specifies the serialization method. It is used to look up the serializer stored in ast.tostringmethods


  • Option for parsing LuaJIT -i number suffixes.
  • Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the minify_tests.lua.
  • How about flags to turn off and on each feature, then a function for auto-detect flag sets based on Lua VERSION string or by running some local load() tests
  • Make all node allocation routed through Parser:node to give the node a .parser field to point back to the parser - necessary for certain AST nodes that need to tell what parser keywords are allowed. I do this where necessary but I should do it always.
    • I've also made this keyword test optional since in some rare projects (vec-lua for one) I am inserting AST nodes for the sake of a portable AST that I can inject as inline'd code, but without a parser, so I don't have a proper enumeration of keywords. So for now I'm making ast node .parser optional and the keyword test bypassed if .parser isn't present. I'll probably make it a hard constraint later when I rework vec-lua.
    • It seems like a quick fix to just convert all into a['b']s ... but Lua for some reason doesn't support a['b']:c() as an equivalent of a.b:c() ... so converting everything from dot to brack index could break some regenerated Lua scripts.
  • To preserve spacing and comments (useful for my langfix transpiler), instead of using ast fields which are tokens, I should use token-references as fields and allow them to be replaced ... maybe ...
  • I'm very tempted to switch the AST index names to remove the preceding underscore. Pro of keeping it: the keywords become valid Lua names. Pro of removing it: the AST index matches the keyword that the AST node represents ...


While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks, so any other script can say "require 'parser.load_xform':insert(function(tree) ... modify the parse tree ... end)" and voila, Lua preprocessor in Lua!

minify_tests.txt taken from the tests at

I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself


Lua parser and abstract syntax tree in Lua







