ChunkBake Reference


Introduction

ChunkBake is a line-oriented assembler for Lua 5 virtual machine instructions. It produces a Lua binary chunk from assembly code, which can then be loaded and run on a Lua 5 virtual machine. The project is hosted on LuaForge, at: http://luaforge.net/projects/chunkbake/. Currently, Lua 5.0.2 (instructions and binary chunk format) is supported. Lua 5.1 support is planned.

This reference guide assumes you know what Lua 5 virtual machine instructions do and how a binary chunk is put together.  If you want to learn about Lua 5 virtual machine instructions, one document you can read is "A No-Frills Introduction to Lua 5 VM Instructions."


Usage

ChunkBake currently sports a very simple interface:

usage: %s [options] [filenames]

options:
  -h, --help        prints usage information
  --quiet           do not display warnings
  -o <file>         specify file name to write output listing
  --                stop handling arguments

example:
  >%s myscript.asm -o myscript.out

If an output file is not explicitly specified using the -o option, the basename of the source file is appended with .out to form the output filename. The format of the binary chunk file that is generated should conform to that of the platform the script is running on. ChunkBake uses Lua to generate a small binary chunk and takes those parameters as defaults. If this somehow fails (and it should not), the defaults are set for the x86 platform. For other platforms, this is untested but should work. The binary chunk format can also be customized using assembler directives, overriding the platform defaults.


Overview

ChunkBake is a classic line-oriented assembler, meaning that a statement (which can either be an instruction or an assembler directive) must be in a single logical line. Usually each logical line occupies one physical line, but the line continuation character '\' can be used to split a long line into multiple lines. A statement can contain a directive or a mnemonic, plus associated operands or parameters.

ChunkBake's lexer is derived from Lua's lexer. Mnemonics (e.g. MOVE,) directives (e.g. .function) and key names for header properties are not case-sensitive, but other declared symbols are case-sensitive.


Lexical Analyzer

The lexer recognizes and handles the usual line-ending styles automatically: LF, CR, CRLF.

Only Lua 5.0.2 opcodes are supported at this time. The mnemonics follow the symbol names defined in the Lua 5 sources (their syntax will be described in detail later):

MOVE LOADK LOADBOOL LOADNIL
GETGLOBAL SETGLOBAL GETUPVAL SETUPVAL
GETTABLE SETTABLE NEWTABLE SELF
ADD SUB MUL DIV POW UNM NOT CONCAT
EQ LT LE TEST
JMP CALL TAILCALL RETURN
FORLOOP TFORLOOP TFORPREP
SETLIST SETLISTO CLOSE CLOSURE

Most numbers allowed by the assembler are integers. For constants, a number must be prefixed immediately by a '#' (pound or sharp) symbol, e.g. #12, #0.123 #-123 #1.2e34. When specifying constants, the number may have a fractional part and an exponent part. To differentiate numbers and numerical constants, this assembler calls numerical constants "immediates" (this is a pretty common term in assembly language.)

Expressions are supported, and the details are discussed in a later section. The '#' is still used to denote a constant, in order to unambiguously differentiate constants from local stack locations.

The lexer recognizes hexadecimal numbers with the '0x' prefix as well. Examples are 0x1234, 0XBEEF.

Strings are specified exactly like Lua 5 strings. Strings may be single-quoted or double-quoted, and the [[...]] form (where the delimiter pair can be nested) can be used as well. The Lua 5.1 extension to the long string form is not yet supported. Single- or double-quoted strings can contain the following escape sequences for control characters: \n, \a, \b, \f, \n, \r, \t, \v, the \ddd style works for 8-bit numbers in general. Single-quote and double quote characters can be escaped as well.

A '\' at the end of a line is a generic line continuation character, so the following are equivalent:

move 0 1
move \
    0 \
    1

A comment to the end of the line can use either '--' or ';':

-- a comment
; a comment

A block comment uses the Lua 5 '--[[...]]' style:

--[[ this is a block comment
     more comments here
--]]

Locals in the current stack frame, which works like registers in a register window, can use several styles. The following are equivalent:

1     ; local 1 on current stack frame
r1    ; use this if you like thinking in terms of registers
R1    ; a capital 'R' probably looks better
$1    ; the '$' prefix is used in some assemblers

Mnemonics cannot be used as user-declared symbols or labels. Any symbol with a '.' as the first character (e.g. .local) is considered to be an assembler directive. The directive names currently used by the assembler are:

.header
.function .func
.end
.param .local .upvalue .const

The assembler has five predefined keywords (case-sensitive) that are used as operands in certain instructions, directives or operators:

nil true false and or

Labels are explicitly identified when a symbol has a ':' suffix. Labels can also be placed at the beginning of most statements without the ':' suffix. All labels or symbols must be defined before they can be used, except for labels used in jumps, which can be forward-referenced. Examples:

FOO: move R0 R1
BAR move $2 $3
BAZ .function

As of version 0.7.0, the symbolic name table is unified, and the same name can no longer be used for different objects. This avoids ambiguity, for example, when a constant and a local have the same name.

Forward-referenced code labels cannot be used in expressions, because all expressions are currently evaluated immediately, except for the special case of a single forward-referenced code label.


Expressions

Expressions in ChunkBake are broadly similar to Lua 5 expressions. However, since ChunkBake exposes the more complex low-level internals of Lua 5, there are a few complications. In ChunkBake, a number of extensions to the expression parser has been made in order to make it possible for the user to flexibly write expression in a sane manner.

Some of the major issues that have cropped up in the implementation of the expression parser are:

The implementation of expression parsing in ChunkBake as of version 0.7.0 is still experimental; if there are better ways to do things, please do tell me your views. Some use cases may not work. The behaviour of the expression parser is expected to be firmed up as the assembler matures and test coverage is more complete.


Implementing Expressions

The ChunkBake expression parser is based on the normal Lua 5 expression parser, with important extensions. The set of operators are similar, except that (a) not in Lua is changed to ~, since not is already an instruction mnemonic, and (b) the # is now a unary operator that turns numbers into immediates, or numerical constants.

Forward References

The initial implementation of the expression parser performs immediate evaluation, therefore all symbols must be defined before they are used in an expression. A special exception is when the expression is a single jump label; this is the only case that can be forward-referenced as it does not involve any unary or binary operations. As such, jumping both forwards or backwards by specifying a single code label works. Example:

jmp FOO      -- single code label forward-referenced, WORKS
jmp FOO+1    -- forward-referenced code label in expression, FAILS
move R1, R0
FOO:
return 0 0

Immediate Operator and Constant Symbols

The immediate operator '#' turns a number into an immediate, or a numerical constant. The operator has no effect on immediates or strings, which are already constants, and is illegal for other values. '#' is the highest priority operator.

If '#' operates on a constant symbol, the constant value is returned. Normally, a symbol would return its integer index value and the '#' turns that into an immediate integer. The special behaviour of '#' for constant symbols allows the user to use either the constant's index or the constant's value in an unambiguous manner. Example:

.const MOO #0xBEEF   -- MOO's index is 0 and value is 0xBEEF
loadk R2, MOO        -- loads register 2 with constant at index 0
loadk R3, #MOO+1     -- loads register 3 with a new constant that
                     -- has the value of #0xBEEF+1

A number and an immediate, after an operation with a binary operator, returns an immediate, i.e. the number is automatically promoted into an immediate. When a string is used in an arithmetic operation, it is coerced into an immediate, not a number.

There is currently no mechanism to turn an immediate into a number, but the best solution, intrinsic functions, is a future to-do. Without intrinsic functions, the next best solution is to rewrite the expression, e.g. by factoring out a common part of the expression.

Symbol Types

A weak form of dynamic typing, by tracking the dominant symbol type of an expression, is the currently implemented solution to the problem of mixing different symbol types. Thus the mixing of different symbol types is allowed, as long as certain rules are followed. If symbols are mixed in an ambiguous manner, the assembler will warn the user of the situation. Also, different operands prefer different symbol types. If an expression returns a result with a symbol type that is not preferred by the assembler, a warning is generated. So, like dynamic typing in Lua, it is up to the user to avoid confusion, as a full-blown type-casting system is deemed too extreme.

The symbol types in ChunkBake are:

Symbol Type
Description
function integer index of function prototype
const integer index of constant in constant pool of a function
upvalue integer index of upvalue in upvalue list
local integer corresponding to a register number on the stack
code integer corresponding to the location of an instruction
(none)
(no symbol type checking is performed on operations and results)

The user does not actually see these symbol types, as they work in the background. If an expression attempts something potentially dangerous, the warning is generated. The register notation (r1, R2 or $3) produces an integer with the "local" symbol type, since the notation is an overt indicator of the intention of the user to use the number as a register.

There is also a neutral number type, which has no symbol type and can be used anywhere. To mix different symbol types, select a dominant symbol type and put the other different symbols in parentheses. Parentheses and equates (using the .equ directive) produces symbol-neutral numbers.

Here are examples of symbol mixing (the move instruction prefers a result of type "local"):

.local FOO       -- value of FOO = 0, symbol type = "local"
.const BAR, #47  -- value of BAR = 0, symbol type = "const"
                 -- (Note: value of #BAR = #47)
BAZ:             -- value of BAZ = 1, symbol type = "code"

move 0, BAR      -- warning, BAR is a constant, not a register/local
move 0, (BAR)    -- okay, (BAR) is now symbol-neutral, (BAR)=0

move 1, BAZ+2    -- warning, BAZ is a code label
move 1, (BAZ)+2  -- okay, (BAZ)+2 = 3, a neutral number

move 2, 0+1         -- okay, operand is a neutral number
move 2, FOO+BAR     -- warning, BAR is a constant
move 2, FOO+(BAR)   -- okay, dominant symbol type is "local"
move 2, R1+(BAZ)    -- okay, R1 has symbol type of "local" too
move 2, (BAR)+(BAZ) -- okay, expression is a neutral number

Although such expressions are not expected to be common, it is important that the assembler does not disallow them. Somewhere down the line, a user might require to write such tricky expressions. So instead of putting up unnecessary barriers, there is only a clearly specified behaviour with respect to symbol type mixing, plus a visual cue as to which symbol is dominant. The user has the choice of making expressions simple and clear, and the user also has the choice of making expressions very complex.

An important use of symbol type tracking is for RK(x)-type operands. The assembler needs to add MAXSTACK to constant indices (this is for Lua 5.0.2, for Lua 5.1, a 1-bit flag is used), so it has to know the dominant symbol type of the expression. If the dominant symbol type is "const", then MAXSTACK is added, since it is implied that the expression is meant to be used as a constant pool index. Example:

.const FOO, #24
.const BAR, #42
add R2, R0, #FOO      -- operand C is a constant, a lookup gives FOO
                      -- C calculated as 0+MAXSTACK
add R2, RO, BAR       -- operand C is a constant, symbol type = "const"
                      -- C calculated as 1+MAXSTACK
add R2, R0, (BAR)     -- operand C is a local, (BAR) is symbol-neutral
                      -- C calculated as 1

For certain instructions with operands used to specify a boolean or a skip condition (LT, EQ, LE, and TEST) the expression for that operand must be a neutral number. This forces the user to add parentheses so that there is a visual cue as to what kind of number is produced.


Operators

ChunkBake has the same set of operators as Lua 5, except for the change of not to ~, and the addition of the immediate operator, #. In addition, parentheses can be used. Parentheses always produces numbers that are symbol-neutral. The loss of symbol type information in this way may not be ideal, because a user may wish to preserve symbol type information for a very complex expression. Practically however, a user who writes such a complex expression had better know very well what he or she is doing.

Versions of ChunkBake prior to 0.7.0 uses the '..' operator as a range operator (it is really more of a lexical element, or token, as it does not result in any operation being performed) for certain instructions. In order to avoid confusion and to allow the expression parser to work, the range operator has been changed to '...' so that string concatenation can use '..' just like in Lua 5. This may break your older assembly programs.

The following is a list of operator classes and their behaviour:

Class
Operator
Operands
Result
Immediate
#
Number, immediate, string
Immediate, string
Arithmetic
* / + - ^
-
(unary)
Number, immediate, string
(warning if different symbol type)
(strings are coerced, error possible)
Number, immediate, string
(symbol type preserved)
String
..
Number, immediate, string
(non-strings are coerced)
String
Equality
== ~= Valid for any type
Boolean
Comparison
< <= > >= Number, immediate, string
(both operands must be of the same type)
Boolean
Unary logical
~ (unary)
Valid for any type Boolean
Binary logical
and or
Valid for any type Depends on operation
(symbol type preserved)

The and and or operators work like their counterparts in Lua 5, so the type of the result depends on whether the first operand or the second operand is chosen as the result. Immediate, arithmetic, string and comparison operators cannot use nil, true or false as operands.

The precedence of the operators is the same as that of Lua 5, and in addition, the new immediate operator # has the highest priority. Note that a boolean result cannot be used directly as a constant when the assembler is targeted to Lua 5.0.2 binary chunks. Instructions that has operands that specify a boolean accepts a boolean expression, though. Lua 5.1 will allow a boolean true or false to be indexed as a constant, so the assembler's behaviour may change in the future.

Tip for users: when using expressions, it is better to explicitly separate operands with commas. Although commas are optional, writing expressions without commas is quite dangerous. For example, two operands, "1 -1", will actually be interpreted as a single operand of value 0 (1-1=0!) so it is better to write it as "1,-1". In addition, "1--1" is interpreted as 1 plus a line comment.

Future planned extensions to expressions are: intrinsic constants and intrinsic functions, and possibly bitwise operators.


Assembler Directives

There are three kinds of directives:

As ChunkBake provide sane defaults, there is often little need to use header directives unless you are generating a non-standard binary chunk. Likewise, you can choose not to specify constants explicitly using resource directives and let the assembler handle it for you.

Most value operands accept expressions, as long as the result of the expression is valid.

In the following sections, each directive will be described in more detail.


.header

.header [key=<string>|<number>[,]]...

The .header directive is an optional directive; you only need to use it to enforce a value for a particular global header parameter. If you are generating a generic binary chunk targeted towards the platform the assembler is running on, you will not need to use it at all.

Apart from being able to customize the global header elements of a Lua 5 binary chunk, the .header directive is also able to set a number of other parameters that governs the generation of the binary chunk, e.g. the maximum number of allowed upvalues can be changed.

Parameters are specified as key-value pairs, optionally separated by a comma. If a statement is very long, it can be split into multiple .header directives, or the line can be split into several physical lines using the '\' line continuation character. If a key is specified a number of times, the last value is taken as the final valid value. The .header directive can only be used before the definition of the first function prototype.

Keys are case-insensitive names, while values can be numbers (usually integers) or strings. For example:

.header signature="\27Foo"

The directive changes the binary chunk signature from the usual "\27Lua" to "\27Foo". In most cases, illegal values are flagged as errors. The assembler allows binary chunk signatures that are not 4 bytes in length; a warning message is generated instead of an error.

The parameters that can be changed are as follows:

Key
Type/Unit
Example
Default (x86)
signature string
signature = "ABCD" "\27Lua"
version
number
version = 0x55
0x50
endianness
number, 0 or 1
endianness = 0 1
int
number, bytes
int = 8
4
size_t
number, bytes size_t = 6
4
instruction number, bytes instruction = 6 4
size_op number, bits size_op = 8 6
size_a number, bits size_a = 9 8
size_b number, bits size_b = 10 9
size_c number, bits size_c = 10 9
number_type1 string, data type
number_type = "single" "double"
maxstack number maxstack = 100 250
maxvars number maxvars = 50 200
maxupvalues number maxupvalues = 10 32
maxparams number maxparams = 50 100
fpf number fpf = 16 32
1 accepts "double", "single", "int", "long long"

The first 11 keys in the list are almost equivalent to their counterpart fields in a standard Lua 5 binary chunk header. The last 5 keys are constants you probably shouldn't touch unless you are running a Lua binary that has non-standard constraints and you know what to do and what the keys mean.

Note that the Lua binary chunk format may change with different versions of Lua. The upcoming Lua 5.1 has a slightly different binary chunk from that of Lua 5.0.2, so beware when changing version.

Errors may be generated if you run afoul of internal constraints, e.g. the instruction opcode field SIZE_OP must exceed 6 bits. Here is an example of a custom .header directive:

.header signature = "Dude", \
        version = 0x57, \
        endianness = 0, \
        int = 6, size_t = 7, \
        instruction = 5, \
        SIZE_OP=8, SIZE_A=10, \
        SIZE_B=11, SIZE_C = 11, \
        number_type = "single"

If you don't want to change the detected default parameters, you don't need to use .header at all.


.function and .end

.func|.function [key=<string>|<number>[,]]...
.end

Each function must be started by a single .function directive and ended by a .end directive. .func, an alias of .function can also be used. There can be only one top-level function and functions must have at least one valid instruction. Thus the simplest assembly listing that can be written is:

.func
return 0 0
.end

The key-value pairs for customizing a function are:

Key
Type/Unit
Example
source_name string
source_name="foo.lua"
line_defined number
line_defined=10
numparams number
numparams=2
is_vararg number, 0 or 1
is_vararg=1
maxstacksize number
maxstacksize=10

Unlike .header parameters, you usually don't need to customize a .function directive. Normally, the assembler will fill in the parameters for you. The source_name parameter is set by the assembler for the top-level function, as is line_defined. numparams is automatically calculated when you use the .param directive, and the assembler keeps track of stack locations used in order to set maxstacksize, so usually it is wiser to leave them be.

For functions that accept a variable number of arguments, is_vararg should be set to 1, and the implicit parameter arg will be in stack position 0. An example of a function that accepts a variable number of arguments is printf.asm in the /test directory.

Here is an example of function header customization:

.function source_name = "Morpheus", \
          line_defined = 1234, \
          numparams=6, is_vararg=1, \
          maxstacksize=47

So, for normal use, you won't need to specify anything for the .function directive, except for is_vararg when you need functions that accept a variable number of arguments.

Functions can be named using labels, so that names can be used instead of numbers when writing a closure instruction. The following are two functions named ONE and TWO:

ONE .function
return 0 0
.end

TWO: .function
return 0 0
.end

Next comes directives that are used to declare named resources for a function.


.param

.param <symbol>

.param declares a register location as a function parameter, or argument, and assigns a name to the parameter. Since parameters occupy the lowest register numbers, the assembler assigns the locations for you automatically, starting from location 0. You are not allowed to declare any local variables before parameter declarations. The numparams field in the function header is automatically updated by the assembler.

.function
.param Var
.param Pos
add 2, Var, Pos
; more instructions here
.end

In the example above, Var is assigned register location 0 while Pos is assigned register location 1. The add instruction adds parameters Var and Pos and saves the result in register 2. You can still use 0 and 1 as the parameter locations in an instruction; the usage of symbolic names and numbers is up to the user.

A parameter declaration need not be placed immediately after .function. You only need to ensure that all local variables are declared after parameter declarations, and that symbolic names names are declared before they are used.


.local

.local <symbol> [[,] <number>]

.local declares local variable names. You can either let the assembler assign the next lowest available location that does not have a symbolic name associated with it (either from a .param or .local declaration.) Alternatively, the register number can be explicitly specified.

.function
.param Foo
.local Bar
.local Baz, 4
; rest of function goes here
.end

Since location 0 is occupied by Foo, the first .local directive associates the name Bar to location 1. The next .local directive explicitly assigns the name Baz to location 4. The comma separator is optional. Now, if there is another .local directive in the function without a specified location number, location 2 will be used.

Locals must be declared after all parameters have been declared. Other than that, locals can be declared anywhere inside a function, as long as symbolic names are declared before they are used.


.upvalue

.upvalue <symbol> [[,] <number>]

The syntax of .upvalue is exactly like the syntax of .local. Assignment of upvalue numbers is always explicit, so each upvalue must be defined using this directive. The assembler sets the number of upvalues (which is a byte-sized field) in the function header for you.

.function
.upvalue Larry
.upvalue Moe 2
; rest of function goes here
.end

Upvalue number 0 is assigned the name Larry, while upvalue number 2 is assigned the name Moe. The operands of the directive can be optionally separated by a comma. When the upvalue list is written out, non-declared values in the upvalue list is assigned the name "(none)" so that loaders don't complain of zero-length names.


.const

.const [<symbol> [,]] <number>|<immediate>|<string>|nil [[,] <number>]

Constants are declared using the .const directive. For constants, both the constant number and the symbolic name of the constant is optional. Leaving out the constant number forces automatic constant number assignment. If no name is associated with a constant, it must be accessed using the constant number.

The constant may be a string, an immediate number (with a leading '#'), a number or nil. A number is interpreted as an immediate, but if you are not comfortable with the ambiguity, you can always declare numbers using the immediate syntax.

Here are some unnamed constant declarations with automatic constant number assignment:

.function
.const 7.8e-9         ; constant number 0, a number, 7.8e-9
.const #12345         ; constant number 1, a number, 12345
.const "Eastwood"     ; constant number 2, a string, "Eastwood"
.const nil            ; constant number 3, a nil
; rest of the function goes here

Constants can be declared anywhere within a function as long as symbolic names are declared before use. Here are more constant declarations:

.const FOO #12.34        ; FOO is a constant with value 12.34
.const BAR, "Eastwood"   ; BAR is a constant with value "Eastwood"
.const Beef, #0xBEEF, 3  ; constant number 3 is named Beef, value 0xBEEF
.const #0xDEAD 2         ; constant number 2 has value 0xDEAD

Commas between operands is optional. Constant declarations with identical values are not merged; as far as the assembler is concerned, it is dealing with separate constants.

In order to simplify the writing of Lua assembly code, it is possible to write code without a single .const declaration. The assembler allows constants to be used directly as operands wherever there is an RK(x) operand or a Kst(x) operand. This is also why numbers need to be disambiguated from numerical constants (immediates). When constants are used directly as instruction operands, the assembler will automatically assign constant numbers to the constants and duplicates constants will use the same constant number.

The user is given the ultimate control as to how constants are declared or used in functions in the assembly code. The intent of the handling system for constants is that important variables can be declared as named symbols, while trivial constants can be used directly as operands in instructions.


.equ

.equ <symbol> [,] <expression>

.equ is a generic mechanism for setting an expression's result to a symbol, much like the classic assembler equ directive. Equates and labels associated with them are the only symbols in the global namespace; all other symbols are local to a particular function prototype. Evaluation of expressions is immediate, so local symbols can be used, and in addition equate symbols can be redefined. Equates can appear anywhere in a program; they affect output only when used as symbols in expressions.

Equates that produces numbers do not have any associated symbol type information, so they are an alternative to using parentheses in expressions when mixing different symbol types. As with languages with dynamic data types, the onus is on the user to code responsibly.

.equ SIZE 800*600              ; a number
.equ BIG (SIZE > 32000)        ; a boolean result
.equ MODE "Large"
.equ Version, 4.7              ; a number
.equ ID, MODE..Version         ; a string
...
.function
.const MySize #SIZE/1000       ; MySize=#480
.const MyID ID                 ; MyID="Large4.7"
...
loadbool R1, BIG, 0            ; loads a boolean (true)
...
.end


Labels and Symbols

Labels can be used for .param, .local, .upvalue, .const, .function and .equ. For .function labels, the top-level function cannot have one, since the label is local to the parent function prototype and the top-level function doesn't have a parent. For .equ, the label is in the global namespace, just like the equate directive.

Since you can name a function's resources using the declarations only, labels are a kind of aliasing mechanism. For example:

Neo: .const Anderson, "Thomas Anderson"

In the example above, both Anderson and Neo points to the same constant number that holds the string "Thomas Anderson". Although mechanism is available to the user, it is probably a good idea to use it only when there are good reasons to do so.


Instruction Operands

The following are descriptions of common Lua 5 virtual machine instruction operands:

The assembler tracks register usage by noting the R(x) and RK(x) operands and sets the maxstacksize field in a function header accordingly. So normally, the user does not need to do anything about the maxstacksize field. You can explicitly set maxstacksize too, but if your custom value is too small, then the assembler will flag an error.

Most value operands accept expressions, as long as the result of the expression is valid.

Other operands are described within the descriptions of Lua 5 virtual machine instructions in the next section.


Lua 5 Instructions

Like most classic assemblers, instruction statements are written on a per-line basis. In the case of ChunkBake, 'per-line' refers to a logical line, since a '\' can be used as a line continuation character. Also, long strings can occupy more than one line.

Operands are optionally separated by commas. Thus, in the syntax descriptions below, any comma can be omitted. This relaxed rule is to allow for different writing styles. It is recommended that the user stick to a single style in a program. In some cases, '...' can be optionally used to replace a comma to denote a range of registers or values. Note: '...' was changed from '..'  in version 0.7.0 because the latter need to be used for string concatenation. This helps to make the intent of the instruction clear, e.g. loadnil 0...2.

Many illegal conditions are trapped by the assembler, however, writing illegal or incorrect instructions is by no means impossible. The assembler lacks a comprehensive warning system at the moment, so it is largely up to the user to enforce some discipline in coding.

The following describes only how instructions are written in the ChunkBake assembler, it does not describe what each instruction does. For help on the latter, please see the beginning of this document. Most of the examples in the following are copied from the automatic test script, TestRig.lua.


MOVE R(A) , R(B)

Moves values between register locations. Examples:

MOVE 0 1
move R3 R4
move $5, $6
move $7, R8
move FOO, BAR


LOADK R(A) , Kst(Bx)

Loads a constant into a register. Examples:

loadk $1, 0
LOADK R1, FOO
loadk 2, "foobar"
loadk $0 #1234


LOADBOOL R(A) , B(0|1|true|false) , C(0|1)

Loads a boolean into a register. The boolean value can be specified as a number (0 for false, 1 for true) or the true and false keywords can be used. The flag in field C for skipping the next instruction may be specified as a 0 (don't skip) or 1 (skip the next instruction.) Examples:

loadbool R0, 0, 0
loadbool $1, 1, 1
loadbool 2 true 0
loadbool 3 false, 1


LOADNIL R(A) (,|..) R(B)

Sets a range of registers to nil. In order to make the intent of the instruction clear, '..' can be used instead of a comma to separate the two operands, thus showing that the instruction is operating on the given register range. The second register must be greater than or equal to the first register. Examples:

loadnil 0,0           ; a single register
loadnil R1..R3        ; makes it clear R1,R2,R3 are set
loadnil 2 $4
loadnil FOO .. BAR


GETUPVAL R(A) , Upvalue[Bx]
SETUPVAL R(A) , Upvalue[Bx]

Loads or saves an upvalue. Examples:

getupval $1, 0
setupval R2 FOO


GETGLOBAL R(A) , Kst(Bx)
SETGLOBAL R(A) , Kst(Bx)

Loads or saves a global variable. Examples:

getglobal R0, FOO
setglobal $1, FOO
getglobal 1, 0
setglobal 2, "bar"
getglobal 3 #0xBEEF


GETTABLE R(A) , R(B) , RK(C)
GETTABLE R(A) , R(B) "[" RK(C) "]"

Gets a value from a table. The table key can be enclosed in square brackets for increased readability. Examples:

gettable 1 0 2
gettable 1, R2, 250
gettable $2 3 #0xBEEF
gettable R4 R5 "foobar"
gettable R0, $1[251]
gettable 1 2[3]
gettable $3, 2["trinity"]


SETTABLE R(A) , RK(B) , RK(C)
SETTABLE R(A) "[" RK(B) "]" , RK(C)

Sets the value of a given table key. The table key can be enclosed in square brackets for increased readability. Examples:

settable 1 2 3
settable R2, 250, $1
settable $1 #0xDEAD, 3
settable R4 "foobar" R5
settable R4 R2 "foobar"
settable $1[251], R0
settable 2["trinity"] $3


NEWTABLE R(A) , (<number>|<immediate>) , (<number>|<immediate>)

Creates a new table object and assigns it to the given register. The assembler accepts either encoded size values (as integer numbers) or raw size values (as immediate numbers). Examples:

newtable 2 0 0               ; table of: array=0, hash=0
newtable R1, 10, 4           ;
table of: array=4, hash=16
newtable 1 #200 #200         ;
table of: array=224, hash=256


SELF R(A) , R(B) , RK(C)
SELF R(A) , R(B) "[" RK(C) "]"

Sets up an object call. Part of the operation includes a table lookup, hence the syntax of self is similar to the syntax of gettable. Examples:

self R1 R2 R3
self 1,2,3
self $2 3[R4]
self R1 R0[FOO]
self R2 R1,250
self $1 BAR[BAR]
self BAR $1[#0xBEEF]
self BAR $1["morpheus"]


ADD R(A) , RK(B) , RK(C)
SUB R(A) , RK(B) , RK(C)
MUL R(A) , RK(B) , RK(C)
DIV R(A) , RK(B) , RK(C)
POW R(A) , RK(B) , RK(C)

Binary arithmetic operations. Examples:

add 0,0,0
add $0,R1,2
add FOO, 0, BAR
sub R1, #123, #456
mul $2, "foo", "bar"
div 3, FOO, 250
pow 0, R1, BAR


UNM R(A) , R(B)
NOT R(A) , R(B)

Unary minus and logical not operations. Examples:

unm 1, $2
not R3 R4
unm FOO, BAR
not BAR FOO


CONCAT R(A) , R(B) (,|...) R(C)

Concatenates a range of registers and assigns the result to a register. A '...' can be used instead of a comma to separate the operands that make up the range of registers. Examples:

concat $0 $1 $3
concat 0,1,1
concat R1 R2...R4
concat 0, FOO...BAR


EQ (0|1) , RK(B) , RK(C)
LT (0|1) , RK(B) , RK(C)
LE (0|1) , RK(B) , RK(C)

Relational operations. The A field can be 0 or 1, which signifies the result to be tested for. The next instruction is skipped if the comparison result is not the same as the expected result. Examples:

eq 0, 1, 2
eq 1 $2 $1
eq 0 R1 R2
eq 0 250 251
lt 0, FOO, BAR
lt 1 #123 #456
le 0 "foo" "bar"
le 1 #47 "47"


TEST R(A) , R(B) , (0|1)

Text operation. The C field can be 0 or 1, which signifies the result to be tested for. The next instruction is skipped if the comparison result is not the same as the expected result. Examples:

test 1, 2, 0
test $2 $1, 1
test R1 R2 0
test FOO BAR 0


JMP label|disp

Performs an unconditional jump. Jumps can be to absolute positions, to relative positions or to a label. Absolute jumps must be a positive integer, but bounds checking is not currently performed, so illegal instructions can be coded. Likewise for relative jumps. Also, the assembler does not currently check the case where the jmp instruction jumps to itself, creating an infinite loop. In future versions of ChunkBake, the user will be warned of such cases. Examples:

jmp #0         ; relative jumps
jmp #-10
jmp #10
jmp 3          ; absolute jump
jmp BAR        ; labeled jump


CALL R(A) , B , C

Calls a closure. Examples:

call $0, 0, 0
call R1, 2, 3
call 2 3 1


TAILCALL R(A) , B
RETURN R(A) , B

Returns to calling function or performs a tail call. Examples:

return $1 2
return R2 20
return 3, 4
return FOO, 8
tailcall $1 2
tailcall R2 20
tailcall 3, 4
tailcall BAR, 66


FORLOOP R(A) , label|disp

Performs an iteration of a numeric for loop. See the jmp instruction on the use and specification of jumps. Examples:

forloop 1 FOO
forloop R1, BAR


TFORLOOP R(A) , C

Performs an iteration of a generic for loop. Examples:

tforloop 0 0
tforloop $1, 4
tforloop R2, 10


TFORPREP R(A) , label|disp

Performs the initialization for a generic for loop. See the jmp instruction on the use and specification of jumps. Examples:

tforprep 1 FOO
tforprep R1, BAR


SETLIST R(A) , Bx
SETLIST R(A) , start [...] end
SETLISTO R(A) , Bx
SETLISTO R(A) , start [...] end

Sets a table with a list of values in a range of registers. You can either encode the range and specify a single number, or specify a range and let the assembler calculate the Bx value. setlisto is used when the final element of the list is a function call, which has an indeterminate number of return values. Examples:

setlist $1, 10             ; index 1 to 11
setlist R2, 40             ; index 33 to 41
setlist 0 1...10            ; index 1 to 10
setlist 1, 33...42          ; index 33 to 42
setlisto 1, 65...74         ; index 65 to top

The encoded number and the range specification must be valid, or else the assembler will report an error. Note that FIELDS_PER_FLUSH (or fpf in the .header directive) can be customized, so the index range calculations will be adjusted accordingly.


CLOSE R(A)

Closes local variables on the stack from the specified register onwards. Examples:

close 0
close FOO



CLOSURE R(A) , Bx

Creates a closure or instantiation of a function, assigning the result to a register. If the function prototype has a label, the label can be used to refer to it. Examples:

closure 1, 0
closure R2 1
closure FOO 2
closure $0 PRINTF           ; see the printf.asm test example


Examples

A number of examples can be found in the test directory of the ChunkBake distribution. The automatic test system, TestRig.lua in the misc directory, has a pretty complete coverage of both legal and illegal forms of all instructions.


This page Copyright © 2005 KHMan under the same terms as ChunkBake. See ChunkBake for license information (MIT-style). Last Revised: 2005-05-12. Personal: http://www.geocities.com/keinhong/ | Project URL: http://luaforge.net/projects/chunkbake/