ChunkBake is a line-oriented assembler
for Lua 5 virtual machine instructions. It produces a Lua binary chunk
from
assembly code, which can then be loaded and run on a Lua 5 virtual
machine. The project is hosted on LuaForge,
at: http://luaforge.net/projects/chunkbake/.
Currently, Lua 5.0.2 (instructions and binary chunk format) is
supported. Lua 5.1 support is planned.
This reference guide assumes you know what Lua 5 virtual machine
instructions do and how a binary chunk is put together.
If you want to learn about Lua 5 virtual machine instructions, one
document you can read
is "A No-Frills
Introduction to Lua 5 VM Instructions."
ChunkBake currently sports a very simple interface:
usage: %s [options] [filenames]
options:
-h, --help prints
usage information
--quiet do
not display warnings
-o <file>
specify file name to write output listing
--
stop handling arguments
example:
>%s myscript.asm -o myscript.out
If an output file is not explicitly specified using the -o
option, the basename of the source file is appended with .out
to form the output filename. The format of the binary chunk file that
is generated should conform to that of the platform the script is
running on. ChunkBake uses Lua to generate a small binary chunk and
takes those parameters as defaults. If this somehow fails (and it
should not), the defaults are set for the x86 platform. For other
platforms, this is untested but should work. The binary chunk format
can also be customized using assembler directives, overriding the
platform defaults.
ChunkBake is a classic line-oriented assembler, meaning that a
statement (which can either be an instruction or an assembler
directive) must be in a single logical line. Usually each logical line
occupies one physical line, but the line continuation character '\
'
can be used to split a long line into multiple lines. A statement can
contain a directive or a mnemonic, plus associated operands or
parameters.
ChunkBake's lexer is derived from Lua's lexer. Mnemonics (e.g. MOVE
,)
directives (e.g. .function
) and key names for header
properties are not case-sensitive, but other declared symbols are
case-sensitive.
The lexer recognizes and handles the usual line-ending styles
automatically: LF
, CR
, CRLF
.
Only Lua 5.0.2 opcodes are supported at this time. The mnemonics follow the symbol names defined in the Lua 5 sources (their syntax will be described in detail later):
MOVE LOADK LOADBOOL LOADNIL
GETGLOBAL SETGLOBAL GETUPVAL SETUPVAL
GETTABLE SETTABLE NEWTABLE SELF
ADD SUB MUL DIV POW UNM NOT CONCAT
EQ LT LE TEST
JMP CALL TAILCALL RETURN
FORLOOP TFORLOOP TFORPREP
SETLIST SETLISTO CLOSE CLOSURE
Most numbers allowed by the assembler are integers. For constants, a
number must be prefixed immediately by a '#
' (pound or
sharp) symbol, e.g. #12, #0.123 #-123 #1.2e34
. When
specifying constants, the number may have a fractional part and an
exponent part. To differentiate numbers and numerical constants, this
assembler calls numerical constants "immediates" (this is a pretty
common term in assembly language.)
Expressions are supported, and the details are discussed in a later
section. The '#
' is still used to denote a constant,
in order to unambiguously differentiate constants from local stack
locations.
The lexer recognizes hexadecimal numbers with the '0x
'
prefix as well. Examples are 0x1234
, 0XBEEF
.
[[...]]
form
(where the delimiter pair can be nested) can be used as
well. The Lua 5.1 extension to the long string form is not yet
supported. Single- or double-quoted strings can contain the following
escape
sequences for control characters:
\n
, \a
, \b
, \f
, \n
,
\r
, \t
, \v
, the \ddd
style works for 8-bit numbers in general. Single-quote and double quote
characters can be escaped as well.
A '\
' at the end of a line is a generic line continuation
character, so
the following are equivalent:
move 0 1
move \
0 \
1
A comment to the end of the line can use either '--
' or
';
':
-- a comment
; a comment
A block comment uses the Lua 5 '--[[...]]
' style:
--[[ this is a block comment
more comments here
--]]
Locals in the current stack frame, which works like registers in a register window, can use several styles. The following are equivalent:
1 ; local
1 on current stack frame
r1 ; use this if you like thinking in terms of
registers
R1 ; a capital 'R' probably looks better
$1 ; the '$' prefix is used in some assemblers
Mnemonics cannot be used as user-declared symbols or labels. Any
symbol with a '.
' as the first character (e.g. .local
)
is considered to be an assembler directive. The directive names
currently used by the assembler are:
.header
.function .func
.end
.param .local .upvalue .const
The assembler has five predefined keywords (case-sensitive) that
are used as operands in certain instructions, directives or operators:
nil true false and or
Labels are explicitly identified when a symbol has a ':
'
suffix. Labels can also be placed at the beginning of most statements
without the ':
' suffix. All labels or symbols must be
defined before they can be used, except for labels used in jumps, which
can be forward-referenced. Examples:
FOO: move R0 R1
BAR move $2 $3
BAZ .function
As of version 0.7.0, the symbolic name table is unified, and the
same name can no longer be used for different objects. This avoids
ambiguity, for example, when a constant and a local have the same name.
Forward-referenced code labels cannot be used in expressions,
because all expressions are currently evaluated immediately, except for
the special case of a single forward-referenced code label.
Expressions in ChunkBake are broadly similar to Lua 5 expressions.
However, since ChunkBake exposes the more complex low-level internals
of Lua 5, there are a few complications. In ChunkBake, a number of
extensions to the expression parser has been made in order to make it
possible for
the user to flexibly write expression in a sane manner.
Some of the major issues that have cropped up in the implementation
of
the
expression parser are:
The implementation of expression parsing in ChunkBake as of version 0.7.0 is still experimental; if there are better ways to do things, please do tell me your views. Some use cases may not work. The behaviour of the expression parser is expected to be firmed up as the assembler matures and test coverage is more complete.
The ChunkBake expression parser is based on the normal Lua 5
expression parser, with important extensions. The set of
operators are similar, except that (a) not
in Lua is
changed to ~
, since not
is already an
instruction mnemonic, and (b) the #
is now a unary
operator that turns numbers into immediates, or numerical constants.
The initial implementation of the expression parser performs immediate evaluation, therefore all symbols must be defined before
they are used in an expression. A special exception is when the
expression is a single jump label; this is the only case that can be
forward-referenced as it does not involve any unary or binary
operations. As such, jumping both forwards or backwards by specifying a
single code label works. Example:
jmp FOO --
single code label forward-referenced, WORKS
jmp FOO+1 -- forward-referenced code label in expression,
FAILS
move R1, R0
FOO:
return 0 0
The immediate operator '#' turns a number into an immediate, or a
numerical constant. The operator has no effect on immediates or
strings, which are already constants, and is illegal for other values.
'#' is the highest priority operator.
If '#' operates on a constant symbol, the constant value is returned.
Normally, a symbol would return its integer index value and the '#'
turns
that into an immediate integer. The special behaviour of '#' for
constant symbols allows the user to use either the constant's index or
the constant's value in an unambiguous manner. Example:
.const MOO #0xBEEF --
MOO's index is 0 and value is 0xBEEF
loadk R2, MOO -- loads register 2 with
constant at index 0
loadk R3, #MOO+1 -- loads register 3 with a new
constant that
-- has the value of #0xBEEF+1
A number and an immediate, after an operation with a binary
operator, returns an immediate, i.e. the number is automatically promoted into an immediate. When a
string is used in an arithmetic operation, it is coerced into an
immediate, not a number.
There is currently no mechanism to turn an immediate into a number,
but the best solution, intrinsic functions, is a future to-do. Without
intrinsic functions, the next best solution is to rewrite the
expression, e.g. by factoring out a common part of the expression.
A weak form of dynamic typing, by tracking the dominant symbol type of an expression, is
the currently implemented solution to the problem of mixing different
symbol types. Thus the
mixing of different symbol types is allowed, as long as certain rules
are followed. If symbols are mixed in an ambiguous manner, the
assembler will warn the user of the situation. Also, different operands
prefer different symbol types. If an expression returns a result with a
symbol type that is not preferred by the assembler, a warning is
generated. So, like dynamic typing
in Lua, it is up to the user to avoid confusion, as a full-blown
type-casting system is deemed too extreme.
The symbol types in ChunkBake are:
Symbol Type |
Description |
---|---|
function | integer index of function
prototype |
const | integer index of constant in
constant pool of a function |
upvalue | integer index of upvalue in
upvalue list |
local | integer corresponding to a
register number on the stack |
code | integer corresponding to the
location of an instruction |
(none) |
(no symbol
type checking is performed on operations and results) |
The user does not actually see these symbol types, as they work in
the background. If an expression attempts something potentially
dangerous, the warning is generated. The register notation (r1
,
R2
or $3
) produces an integer with the
"local" symbol type, since the notation is an overt indicator of the
intention of the user to use the number as a register.
There is also a neutral number type, which has no symbol type and
can be used anywhere. To mix different symbol types, select a dominant
symbol type and put the other different symbols in parentheses. Parentheses and
equates (using the .equ
directive) produces
symbol-neutral numbers.
Here are examples of symbol mixing (the move
instruction
prefers a result of type "local"):
.local FOO
-- value of FOO = 0, symbol type = "local"
.const BAR, #47 -- value of BAR = 0, symbol type = "const"
-- (Note: value of #BAR = #47)
BAZ:
-- value of BAZ = 1, symbol type = "code"
move 0, BAR -- warning, BAR is a constant, not
a register/local
move 0, (BAR) -- okay, (BAR) is now symbol-neutral,
(BAR)=0
move 1, BAZ+2 -- warning, BAZ is a code label
move 1, (BAZ)+2 -- okay, (BAZ)+2 = 3, a neutral number
move 2, 0+1 -- okay, operand is a
neutral number
move 2, FOO+BAR -- warning, BAR is a constant
move 2, FOO+(BAR) -- okay, dominant symbol type is "local"
move 2, R1+(BAZ) -- okay, R1 has symbol type of
"local" too
move 2, (BAR)+(BAZ) -- okay, expression is a neutral number
Although such expressions are not expected to be common, it is
important that the assembler does not disallow them. Somewhere down the
line, a user might require to write such tricky expressions. So instead
of putting up unnecessary barriers,
there is
only a clearly specified behaviour with
respect to symbol type mixing, plus a visual
cue as to which symbol is dominant. The user has the choice of
making expressions simple and clear, and the user also has the
choice of making expressions very complex.
An important use of symbol type tracking is for RK(x)-type operands.
The assembler needs to add MAXSTACK
to constant indices
(this is for Lua 5.0.2, for Lua 5.1, a 1-bit flag is used),
so it has to know the dominant symbol type of the expression. If the
dominant symbol type is "const", then MAXSTACK
is added,
since it is implied that the expression is meant to be used as a
constant pool index. Example:
.const FOO, #24
.const BAR, #42
add R2, R0, #FOO -- operand C is a constant, a
lookup gives FOO
-- C calculated as 0+MAXSTACK
add R2, RO, BAR -- operand C is a constant,
symbol type = "const"
-- C calculated as 1+MAXSTACK
add R2, R0, (BAR) -- operand C is a local,
(BAR) is symbol-neutral
-- C calculated as 1
For certain instructions with operands used to specify a boolean or
a skip condition (LT
, EQ
, LE
,
and TEST
) the expression for that operand must be a
neutral number. This forces the user to add parentheses so that there
is a visual cue as to what kind of number is produced.
ChunkBake has the same set of operators as Lua 5, except for the
change of not
to ~
, and the addition of the
immediate operator, #
. In addition, parentheses can be
used. Parentheses always produces numbers that are symbol-neutral. The
loss of symbol type information in this way may not be ideal, because a
user may wish to preserve symbol type information for a very complex
expression. Practically however, a user who writes such a complex
expression had better
know very well what he or she is doing.
Versions of ChunkBake prior to 0.7.0 uses the '..' operator as a
range operator (it is really more of a lexical element, or token, as it
does not result in any operation being performed) for certain
instructions. In order to avoid confusion and to allow the expression
parser to work, the range operator has been changed to '...' so that
string concatenation can use '..' just like in Lua 5. This may break
your older assembly programs.
The following is a list of operator classes and their behaviour:
Class |
Operator |
Operands |
Result |
---|---|---|---|
Immediate |
# |
Number, immediate, string |
Immediate, string |
Arithmetic |
* / + - ^ (unary)
|
Number, immediate, string (warning if different symbol type) (strings are coerced, error possible) |
Number, immediate, string (symbol type preserved) |
String |
.. |
Number, immediate, string (non-strings are coerced) |
String |
Equality |
== ~= |
Valid for any type |
Boolean |
Comparison |
< <=
> >= |
Number, immediate, string (both operands must be of the same type) |
Boolean |
Unary logical |
~ (unary)
|
Valid for any type | Boolean |
Binary logical |
and or |
Valid for any type | Depends on operation (symbol type preserved) |
The and
and or
operators work like their
counterparts in Lua 5, so the type of the result depends on whether the
first operand or the second operand is chosen as the result. Immediate,
arithmetic, string and comparison operators cannot use nil
,
true
or false
as operands.
The precedence of the operators is the same as that of Lua 5, and in
addition, the new immediate operator #
has the highest
priority. Note that a boolean result cannot be used directly as a
constant when the assembler is targeted to Lua 5.0.2 binary chunks.
Instructions that has operands that specify a boolean accepts a boolean
expression, though. Lua 5.1 will allow a boolean true or
false to be indexed as a constant, so the assembler's behaviour may
change in the future.
Tip for users: when using expressions, it is better to explicitly
separate operands with commas. Although commas are optional, writing
expressions without commas is quite dangerous. For example, two
operands, "1 -1
", will actually be interpreted as a single
operand of value 0 (1-1=0
!) so it is better to write it as
"1,-1
". In addition, "1--1
" is interpreted as
1 plus a line comment.
Future planned extensions to expressions are: intrinsic constants
and intrinsic functions, and possibly bitwise operators.
There are three kinds of directives:
.header
is used to set global binary chunk header
parameters and global characteristics..function
is used to set a function header's
characteristics, and is paired with .end
to contain the
assembly code for a single function.version=0x50, size_t=2
.param
is for declaring named parameters. The
assembler will keep track of the number of parameters and update the numparams
function header field automatically for you..local
is for declaring named locals. You may
specify the exact stack location for the declared local, instead of
letting the assembler allocate the next available location..upvalue
is for declaring named upvalues. You may
specify the exact index for the declared upvalue, instead of
letting the assembler allocate the next available index. .const
is for declaring named constants. Constant
handling is special, as the assembler accepts constants used directly
as operands in an instruction. This greatly simplifies assembler code
in many cases. The choice whether to declare explicitly or
automatically is up to you.
.equ
is the assembler's version of the classic
assembler equate directive. Equates live in a global namespace, as
opposed to resource directives, which live a namespace local to a
function. Labels associated with equates are also global. Equates can
be redefined.As ChunkBake provide sane defaults, there is often little need to
use header directives unless you are generating a non-standard binary
chunk. Likewise, you can choose not to specify constants explicitly
using resource directives and let the assembler handle it for you.
Most value operands accept expressions, as long as the result of the
expression is valid.
In the following sections, each directive will be described in more
detail.
.header [key=<string>|<number>[,]]...
The .header
directive is an optional directive; you
only need to use it to enforce a value for a particular global header
parameter. If you are generating a generic binary chunk targeted
towards the platform the assembler is running on, you will not need to
use it at all.
Apart from being able to customize the global header elements of a
Lua 5 binary chunk, the .header
directive is also able to
set a number of other parameters that governs the generation of the
binary chunk, e.g. the maximum number of allowed upvalues can be
changed.
Parameters are specified as key-value pairs, optionally separated by
a comma. If a statement is very long, it can be split into multiple .header
directives, or the line can be split into several physical lines using
the '\
' line continuation character. If a key is specified
a number of times, the last value is taken as the final valid value.
The .header
directive can only be used before the
definition of the first function prototype.
Keys are case-insensitive names, while values can be numbers
(usually integers) or strings. For example:
.header signature="\27Foo"
The directive changes the binary chunk signature from the usual
"\27Lua" to "\27Foo". In most cases, illegal values are flagged as
errors. The assembler
allows binary chunk signatures that are not 4 bytes in length; a
warning message is generated instead of an error.
The parameters that can be changed are as follows:
Key |
Type/Unit |
Example |
Default (x86) |
---|---|---|---|
signature | string |
signature = "ABCD" |
"\27Lua" |
version |
number |
version = 0x55 |
0x50 |
endianness |
number, 0 or 1 |
endianness = 0 |
1 |
int |
number, bytes |
int = 8 |
4 |
size_t |
number, bytes | size_t = 6 |
4 |
instruction | number, bytes | instruction = 6 |
4 |
size_op | number, bits | size_op =
8 |
6 |
size_a | number, bits | size_a =
9 |
8 |
size_b | number, bits | size_b =
10 |
9 |
size_c | number, bits | size_c =
10 |
9 |
number_type1 | string, data type |
number_type = "single" |
"double" |
maxstack | number | maxstack = 100 |
250 |
maxvars | number | maxvars =
50 |
200 |
maxupvalues | number | maxupvalues = 10 |
32 |
maxparams | number | maxparams = 50 |
100 |
fpf | number | fpf =
16 |
32 |
The first 11 keys in the list are almost equivalent to their
counterpart fields in a standard Lua 5 binary chunk header. The last 5
keys are constants you probably shouldn't touch unless you are running
a Lua binary that has non-standard constraints and you know what to do
and what the keys mean.
Note that the Lua binary chunk format may change with different
versions of Lua. The upcoming Lua 5.1 has a slightly different binary
chunk from that of Lua 5.0.2, so beware when changing version
.
Errors may be generated if you run afoul of internal constraints,
e.g. the instruction opcode field SIZE_OP
must exceed 6
bits. Here is an example of a custom .header
directive:
.header signature = "Dude", \
version = 0x57, \
endianness = 0, \
int
= 6, size_t = 7, \
instruction = 5, \
SIZE_OP=8, SIZE_A=10, \
SIZE_B=11, SIZE_C = 11, \
number_type = "single"
If you don't want to change the detected default parameters, you
don't need to use .header
at all.
.func
|.function [key=<string>|<number>[,]]...
.end
Each function must be started by a single .function
directive and ended by a .end
directive. .func
,
an alias of .function
can also be used. There can be only
one top-level function and functions must have at least one valid
instruction. Thus the simplest assembly listing that can be written is:
.func
return 0 0
.end
The key-value pairs for customizing a function are:
Key |
Type/Unit |
Example |
---|---|---|
source_name | string |
source_name="foo.lua" |
line_defined | number |
line_defined=10 |
numparams | number |
numparams=2 |
is_vararg | number, 0 or 1 |
is_vararg=1 |
maxstacksize | number |
maxstacksize=10 |
Unlike .header
parameters, you usually don't need to
customize a
.function
directive. Normally, the assembler will fill in
the parameters for you. The source_name
parameter is set
by the assembler for the top-level function, as is line_defined
.
numparams
is automatically calculated when you use the .param
directive, and the assembler keeps track of stack locations used in
order to set maxstacksize
, so usually it is wiser to
leave them be.
For functions that accept a variable number of arguments, is_vararg
should be set to 1, and the implicit parameter arg
will
be in stack position 0. An example of a function that accepts a
variable number of arguments is printf.asm
in the /test
directory.
Here is an example of function header customization:
.function source_name =
"Morpheus", \
line_defined = 1234, \
numparams=6, is_vararg=1, \
maxstacksize=47
So, for normal use, you won't need to specify anything for the .function
directive, except for is_vararg
when you need functions
that accept a variable number of arguments.
Functions can be named using labels, so that names can be used
instead of numbers when writing a closure
instruction.
The following are two functions named ONE
and TWO
:
ONE .function
return 0 0
.end
TWO: .function
return 0 0
.end
Next comes directives that are used to declare named resources for a
function.
.param <symbol>
.param
declares a register location as a function
parameter, or argument, and assigns a name to the parameter. Since
parameters occupy the lowest register numbers, the assembler assigns
the locations for you automatically, starting from location 0. You are
not allowed to declare any local variables before parameter
declarations. The numparams
field in the function header
is automatically updated by the assembler.
.function
.param Var
.param Pos
add 2, Var, Pos
; more instructions here
.end
In the example above, Var
is assigned register
location 0 while Pos
is assigned register location 1. The
add
instruction adds parameters Var
and Pos
and saves the result in register 2. You can still use 0 and 1 as the
parameter locations in an instruction; the usage of symbolic names and
numbers is up to the user.
A parameter declaration need not be placed immediately after .function
.
You only need to ensure that all local variables are declared after
parameter declarations, and that symbolic names names are declared
before they are used.
.local <symbol> [[,] <number>]
.local
declares local variable names. You can either
let the assembler assign the next lowest available location that does
not have a symbolic name associated with it (either from a .param
or .local
declaration.) Alternatively, the register
number can be explicitly specified.
.function
.param Foo
.local Bar
.local Baz, 4
; rest of function goes here
.end
Since location 0 is occupied by Foo
, the first .local
directive associates the name Bar
to location 1. The next
.local
directive explicitly assigns the name Baz
to location 4. The comma separator is optional. Now, if there is
another .local
directive in the function without a
specified location number, location 2 will be used.
Locals must be declared after all parameters have been declared. Other than that, locals can be declared anywhere inside a function, as long as symbolic names are declared before they are used.
.upvalue
<symbol> [[,] <number>]
The syntax of .upvalue
is exactly like the syntax of .local
.
Assignment of upvalue numbers is always explicit, so each upvalue must
be defined using this directive. The
assembler sets the number of upvalues (which is a byte-sized field) in
the function header for you.
.function
.upvalue Larry
.upvalue Moe 2
; rest of function goes here
.end
Upvalue number 0 is assigned the name Larry
, while
upvalue number 2 is assigned the name Moe
. The operands
of the directive can be optionally separated by a comma. When the
upvalue list is written out, non-declared values in the upvalue list is
assigned the name "(none)
" so that loaders don't complain
of zero-length names.
.const [<symbol> [,]]
<number>|<immediate>|<string>|nil [[,] <number>]
Constants are declared using the .const
directive. For
constants, both the constant number and the symbolic name of the
constant is optional. Leaving out the constant number forces automatic
constant number assignment. If no name is associated with a constant,
it must be accessed using the constant number.
The constant may be a string, an immediate number (with a leading '#
'),
a number or nil
. A number is interpreted as an immediate,
but if you are
not comfortable with the ambiguity, you can always declare numbers
using the immediate syntax.
Here are some unnamed constant declarations with automatic constant number assignment:
.function
.const 7.8e-9 ;
constant number 0, a number, 7.8e-9
.const #12345 ;
constant number 1, a number, 12345
.const "Eastwood" ; constant number 2, a
string, "Eastwood"
.const
nil ;
constant number 3, a nil
; rest of the function goes here
Constants can be declared anywhere within a function as long as
symbolic names are declared before use. Here are more constant
declarations:
.const FOO
#12.34 ; FOO is a constant
with value 12.34
.const BAR, "Eastwood" ; BAR is a constant with value
"Eastwood"
.const Beef, #0xBEEF, 3 ; constant number 3 is named Beef,
value 0xBEEF
.const #0xDEAD 2
; constant number 2 has value 0xDEAD
Commas between operands is optional. Constant declarations with
identical values are not merged; as far as the assembler is concerned,
it is dealing with separate constants.
In order to simplify the writing of Lua assembly code, it is
possible to write code without a single .const
declaration. The assembler allows constants to be used directly as
operands wherever there is an RK(x)
operand or a Kst(x)
operand. This is also
why numbers need to be disambiguated from numerical constants
(immediates). When constants are used directly as instruction operands,
the assembler will automatically assign constant numbers to the
constants and duplicates constants will use the same constant number.
The user is given the ultimate control as to how constants are
declared or used in functions in the assembly code. The intent of the
handling system for constants is that important variables can be
declared as named symbols, while trivial constants can be used directly
as operands in instructions.
.equ
<symbol> [,]
<expression>
.equ
is a generic mechanism for setting an expression's
result to a symbol, much like the classic assembler equ
directive. Equates and labels associated with them are the only symbols
in the global namespace; all
other symbols are local to a particular function prototype. Evaluation
of expressions is immediate, so local symbols can be used, and in
addition equate
symbols can be redefined. Equates can appear anywhere in a program;
they affect output only when used as symbols in expressions.
Equates that produces numbers do not have any associated symbol type
information, so they are an alternative to using parentheses in
expressions when mixing different symbol types. As with languages with
dynamic data types, the onus is on the user to code responsibly.
.equ SIZE 800*600
; a number
.equ BIG (SIZE > 32000) ; a
boolean result
.equ MODE "Large"
.equ Version, 4.7
; a number
.equ ID, MODE..Version ; a string
...
.function
.const MySize #SIZE/1000 ; MySize=#480
.const MyID ID
; MyID="Large4.7"
...
loadbool R1, BIG, 0
; loads a boolean (true)
...
.end
Labels can be used for .param
, .local
, .upvalue
,
.const
, .function
and .equ
.
For .function
labels, the top-level function cannot have
one, since the label is local to the parent
function prototype and the top-level function doesn't have a
parent. For .equ
, the label is in the global namespace,
just like the equate directive.
Since you can name a function's resources
using the declarations only, labels are a kind of aliasing mechanism.
For example:
Neo: .const Anderson, "Thomas
Anderson"
In the example above, both Anderson
and Neo
points to the same constant number that holds the string "Thomas
Anderson". Although mechanism is available to the user, it is probably
a good idea to use it only when there are good reasons to do so.
R(x)
: A register location in the current function's
stack frame. E.g. R(A)
means that opcode field A
specifies a register location. A register can be written in a few ways:
For example, register 2 can be written as 2
, $2
or R2
. It can also be a name that refers to a parameter
or a local variable. The name can be defined as a label or in a .param
or .local
directive.RK(x)
: The operand may be a register location or it
may point to a constant in the constant pool of the function. A
register location is specified in the same manner as in an R(x)
operand. Constant numbers can be explicitly specified, but note that
they must be greater or equal to maxstack
, which is
normally 250. Note that maxstack
can be customized in a .header
directive. So, a 250
will normally point to constant
number 0. In theory, R250
can also be used, but this is
misleading; constant numbers should be written as bare numbers.
Alternatively, constants can be referenced using a name or a label.
Constants can also be used, e.g. #47
, nil
or "Foo"
are legal operands. Such constants are allocated a number
automatically, and identical values or strings share the same constant
number. This is to allow trivial constants to be used very easily,
without the hassle of declaring each and every constant. The scheme is
very flexible, it is up to the user to write clear and readable code.Kst(x)
: This operand denotes a constant only.
Constant numbers, names or labels, or immediate constants in the form
of immediate numbers (e.g. #123
), strings or the nil
keyword can be used.
Like RK(x)
, if constants are used directly, the assembler
handles allocation automatically.Disp
: Denotes a displacement. A number (e.g. 20
)
denotes an absolute target location. An immediate number (e.g. #1
,
#-10
) denotes a relative displacement which
corresponds directly to the value in the sBx
field of the
assembled instruction. A label name makes writing jump targets much
easier. The target label must be somewhere within the same function
body, and forward and backward reference is allowed.The assembler tracks register usage by noting the R(x)
and RK(x)
operands and sets the maxstacksize
field in a function header accordingly. So normally, the user does not
need to do anything about the maxstacksize
field. You can
explicitly set maxstacksize
too, but if your custom value
is too small, then the assembler will flag an error.
Most value operands accept expressions, as long as the result of the expression is valid.
Other operands are described within the descriptions of Lua 5
virtual machine instructions in the next section.
Like most classic assemblers, instruction statements are written on
a per-line basis. In the case of ChunkBake, 'per-line' refers to a
logical line, since a '\
' can be used as a line
continuation character. Also, long strings can occupy more than one
line.
Operands are optionally separated by commas. Thus, in the syntax
descriptions below, any comma can be omitted. This relaxed rule is to
allow for different writing styles. It is recommended that the user
stick to a single style in a program. In some cases, '...
'
can be optionally used to replace a comma to denote a range of
registers or values. Note: '...
'
was changed from '..
'
in version 0.7.0 because the latter need to be used for string
concatenation. This helps to make the intent of the instruction
clear, e.g. loadnil 0...2
.
Many illegal conditions are trapped by the assembler, however,
writing illegal or incorrect instructions is by no means impossible.
The assembler lacks a comprehensive warning system at the moment, so it
is largely up to the user to enforce some discipline in coding.
The following describes only how
instructions are written in the ChunkBake assembler, it does not describe what each
instruction does. For help on the latter, please see the beginning of
this document. Most of the examples in the following are copied from
the automatic test script, TestRig.lua
.
MOVE
R(A) , R(B)
Moves values between register locations. Examples:
MOVE 0 1
move R3 R4
move $5, $6
move $7, R8
move FOO, BAR
LOADK
R(A) , Kst(Bx)
Loads a constant into a register. Examples:
loadk $1, 0
LOADK R1, FOO
loadk 2, "foobar"
loadk $0 #1234
LOADBOOL
R(A) , B(0|1|true|false) , C(0|1)
Loads a boolean into a register. The boolean value can be specified
as a number (0
for false, 1
for true) or the
true
and false
keywords can be used. The
flag in field C for skipping the next instruction may be specified as a
0
(don't skip) or 1
(skip the next
instruction.) Examples:
loadbool R0, 0, 0
loadbool $1, 1, 1
loadbool 2 true 0
loadbool 3 false, 1
LOADNIL
R(A) (,|..) R(B)
Sets a range of registers to nil
. In order to make the
intent of the instruction clear, '..
' can be used instead
of a comma to separate the two operands, thus showing that the
instruction is operating on the given register range. The second
register must be greater than or equal to the first register. Examples:
loadnil
0,0 ; a
single register
loadnil R1..R3 ; makes it
clear R1,R2,R3 are set
loadnil 2 $4
loadnil FOO .. BAR
GETUPVAL
R(A) , Upvalue[Bx]
SETUPVAL R(A) , Upvalue[Bx]
Loads or saves an upvalue. Examples:
getupval $1, 0
setupval R2 FOO
GETGLOBAL
R(A) , Kst(Bx)
SETGLOBAL R(A) , Kst(Bx)
Loads or saves a global variable. Examples:
getglobal R0, FOO
setglobal $1, FOO
getglobal 1, 0
setglobal 2, "bar"
getglobal 3 #0xBEEF
GETTABLE
R(A) , R(B) , RK(C)
GETTABLE R(A) , R(B) "[" RK(C) "]"
Gets a value from a table. The table key can be enclosed in square brackets for increased readability. Examples:
gettable 1 0 2
gettable 1, R2, 250
gettable $2 3 #0xBEEF
gettable R4 R5 "foobar"
gettable R0, $1[251]
gettable 1 2[3]
gettable $3, 2["trinity"]
SETTABLE
R(A) , RK(B) , RK(C)
SETTABLE R(A) "[" RK(B) "]" , RK(C)
Sets the value of a given table key. The table key can be enclosed in square brackets for increased readability. Examples:
settable 1 2 3
settable R2, 250, $1
settable $1 #0xDEAD, 3
settable R4 "foobar" R5
settable R4 R2 "foobar"
settable $1[251], R0
settable 2["trinity"] $3
NEWTABLE
R(A) , (<number>|<immediate>) , (
<number>|<immediate>
)
Creates a new table object and assigns it to the given register. The assembler accepts either encoded size values (as integer numbers) or raw size values (as immediate numbers). Examples:
newtable 2 0
0
; table of: array=0, hash=0
newtable R1, 10,
4 ;
table
of:
array=4, hash=16
newtable 1 #200 #200 ;
table
of:
array=224, hash=256
SELF
R(A) , R(B) , RK(C)
SELF R(A) , R(B) "[" RK(C) "]"
Sets up an object call. Part of the operation includes a table
lookup, hence the syntax of self
is similar to the syntax
of gettable
. Examples:
self R1 R2 R3
self 1,2,3
self $2 3[R4]
self R1 R0[FOO]
self R2 R1,250
self $1 BAR[BAR]
self BAR $1[#0xBEEF]
self BAR $1["morpheus"]
ADD
R(A) , RK(B) , RK(C)
SUB R(A) , RK(B) , RK(C)
MUL R(A) , RK(B) , RK(C)
DIV R(A) , RK(B) , RK(C)
POW R(A) , RK(B) , RK(C)
Binary arithmetic operations. Examples:
add 0,0,0
add $0,R1,2
add FOO, 0, BAR
sub R1, #123, #456
mul $2, "foo", "bar"
div 3, FOO, 250
pow 0, R1, BAR
UNM
R(A) , R(B)
NOT R(A) , R(B)
Unary minus and logical not operations. Examples:
unm 1, $2
not R3 R4
unm FOO, BAR
not BAR FOO
CONCAT
R(A) , R(B) (,|...) R(C)
Concatenates a range of registers and assigns the result to a
register. A '...
' can be used instead of a comma to
separate the operands that make up the range of registers. Examples:
concat $0 $1 $3
concat 0,1,1
concat R1 R2...R4
concat 0, FOO...BAR
EQ
(0|1) , RK(B) , RK(C)
LT (0|1) , RK(B) , RK(C)
LE (0|1) , RK(B) , RK(C)
Relational operations. The A field can be 0 or 1, which signifies the result to be tested for. The next instruction is skipped if the comparison result is not the same as the expected result. Examples:
eq 0, 1, 2
eq 1 $2 $1
eq 0 R1 R2
eq 0 250 251
lt 0, FOO, BAR
lt 1 #123 #456
le 0 "foo" "bar"
le 1 #47 "47"
TEST
R(A) , R(B) , (0|1)
Text operation. The C field can be 0 or 1, which signifies the result to be tested for. The next instruction is skipped if the comparison result is not the same as the expected result. Examples:
test 1, 2, 0
test $2 $1, 1
test R1 R2 0
test FOO BAR 0
JMP
label|disp
Performs an unconditional jump. Jumps can be to absolute positions,
to relative positions or to a label. Absolute jumps must be a positive
integer, but bounds checking is not currently performed, so illegal
instructions can be coded. Likewise for relative jumps. Also, the
assembler does not currently check the case where the jmp
instruction jumps to itself, creating an infinite loop. In future
versions of ChunkBake, the user will be warned of such cases. Examples:
jmp
#0 ; relative jumps
jmp #-10
jmp #10
jmp 3 ; absolute
jump
jmp BAR ; labeled jump
CALL
R(A) , B , C
Calls a closure. Examples:
call $0, 0, 0
call R1, 2, 3
call 2 3 1
TAILCALL
R(A) , B
RETURN R(A) , B
Returns to calling function or performs a tail call. Examples:
return $1 2
return R2 20
return 3, 4
return FOO, 8
tailcall $1 2
tailcall R2 20
tailcall 3, 4
tailcall BAR, 66
FORLOOP
R(A) , label|disp
Performs an iteration of a numeric for
loop. See the jmp
instruction on the use and
specification of jumps. Examples:
forloop 1 FOO
forloop R1, BAR
TFORLOOP
R(A) , C
Performs an iteration of a generic for loop. Examples:
tforloop 0 0
tforloop $1, 4
tforloop R2, 10
TFORPREP
R(A) , label|disp
Performs the initialization for a generic for loop. See the jmp
instruction on the use and
specification of jumps. Examples:
tforprep 1 FOO
tforprep R1, BAR
SETLIST
R(A) , Bx
SETLIST R(A) , start [...] end
SETLISTO R(A) , Bx
SETLISTO R(A) , start [...] end
Sets a table with a list of values in a range of registers. You can
either encode the range and specify a single number, or specify a range
and let the assembler calculate the Bx
value. setlisto
is
used when the final element of the list is a function call, which has
an indeterminate number of return values. Examples:
setlist $1,
10
; index 1 to 11
setlist R2,
40
; index 33 to 41
setlist 0
1...10
; index 1 to 10
setlist 1,
33...42
; index 33 to 42
setlisto 1, 65...74 ;
index 65 to top
The encoded number and the range specification must be valid, or
else the assembler will report an error. Note that FIELDS_PER_FLUSH
(or fpf
in the .header
directive) can be
customized, so the index range calculations will be adjusted
accordingly.
CLOSE
R(A)
Closes local variables on the stack from the specified register onwards. Examples:
close 0
close FOO
CLOSURE
R(A) , Bx
Creates a closure or instantiation of a function, assigning the result to a register. If the function prototype has a label, the label can be used to refer to it. Examples:
closure 1, 0
closure R2 1
closure FOO 2
closure $0
PRINTF ;
see the printf.asm test example
A number of examples can be found in the test
directory of the ChunkBake distribution. The automatic test system, TestRig.lua
in the misc
directory, has a pretty complete coverage of
both legal and illegal forms of all instructions.