The built-in assembler allows you to write Intel assembler code within Object Pascal programs. It implements a large subset of the syntax supported by Turbo Assembler and Microsoft's Macro Assembler, including all 8086/8087 and 80386/80387 opcodes and all but a few of Turbo Assembler's expression operators. Moreover, the built-in assembler allows you to use Object Pascal identifiers in assembler statements.
Except for DB, DW, and DD (define byte, word, and double word), none of Turbo Assembler's directives (such as EQU, PROC, STRUC, SEGMENT, and MACRO) are supported by the built-in assembler. Operations implemented through Turbo Assembler directives, however, are largely matched by corresponding Object Pascal constructions. For example, most EQU directives correspond to constant, variable, and type declarations; the PROC directive corresponds to procedure and function declarations; and the STRUC directive corresponds to record types.
As an alternative to the built-in assembler, you can link to .OBJ files that contain external procedures and functions. See "Linking to .OBJ files" for more information.
The built-in assembler is accessed through asm statements, which have the form
where statementList is a sequence of assembler statements separated by semicolons, end-of-line characters, or Object Pascal comments.
Comments in an asm statement must be in Object Pascal style. A semicolon does not indicate that the rest of the line is a comment.
The reserved word inline and the directive assembler are maintained for backward compatibility only. They have no effect on the compiler.
In general, the rules of register use in an asm statement are the same as those of an external procedure or function. An asm statement must preserve the EDI, ESI, ESP, EBP, and EBX registers, but can freely modify the EAX, ECX, and EDX registers. On entry to an asm statement, BP points to the current stack frame, SP points to the top of the stack, SS contains the segment address of the stack segment, and DS contains the segment address of the data segment. Except for EDI, ESI, ESP, EBP, and EBX, an asm statement can assume nothing about register contents on entry to the statement.
This syntax of an assembler statement is
Label: Prefix Opcode Operand1, Operand2
where Label is a label, Prefix is an assembler prefix opcode (operation code), Opcode is an assembler instruction opcode or directive, and Operand is an assembler expression. Label and Prefix are optional. Some opcodes take only one operand, and some take none.
Comments are allowed between assembler statements, but not within them. For example,
MOV AX,1 {Initial value} { OK }
MOV CX,100 {Count} { OK }
MOV {Initial value} AX,1; { Error! }
MOV CX, {Count} 100 { Error! }
Labels are used in built-in assembler statements as they are in Object Pascal--by writing the label and a colon before a statement. There is no limit to a label's length, but only the first 32 characters are significant. As in Object Pascal, labels must be declared in a label declaration part in the block containing the asm statement. There is one exception to this rule: local labels.
Local labels are labels that start with an at-sign (@). They consist of an at-sign followed by one or more letters, digits, underscores, or at-signs. Use of local labels is restricted to asm statements, and the scope of a local label extends from the asm reserved word to the end of the asm statement that contains it. A local label doesn't have to be declared.
The built-in assembler supports the following opcodes.
For a complete description of each instruction, refer to your microprocessor documentation.
The RET instruction opcode always generates a near return.
Unless otherwise directed, the built-in assembler optimizes jump instructions by automatically selecting the shortest, and therefore most efficient, form of a jump instruction. This automatic jump sizing applies to the unconditional jump instruction (JMP), and to all conditional jump instructions when the target is a label (not a procedure or function).
For an unconditional jump instruction (JMP), the built-in assembler generates a short jump (one-byte opcode followed by a one-byte displacement) if the distance to the target label is -128 to 127 bytes. Otherwise it generates a near jump (one-byte opcode followed by a two-byte displacement).
For a conditional jump instruction, a short jump (one-byte opcode followed by a one-byte displacement) is generated if the distance to the target label is -128 to 127 bytes. Otherwise, the built-in assembler generates a short jump with the inverse condition, which jumps over a near jump to the target label (five bytes in total). For example, the assembler statement
JC Stop
where Stop isn't within reach of a short jump, is converted to a machine code sequence that corresponds to this:
JNC Skip JMP Stop Skip:
Jumps to the entry points of procedures and functions are always near.
The built-in assembler supports three assembler directives: DB (define byte), DW (define word), and DD (define double word). Each generates data corresponding to the comma-separated operands that follow the directive.
The DB directive generates a sequence of bytes. Each operand can be a constant expression with a value between -128 and 255, or a character string of any length. Constant expressions generate one byte of code, and strings generate a sequence of bytes with values corresponding to the ASCII code of each character.
The DW directive generates a sequence of words. Each operand can be a constant expression with a value between -32,768 and 65,535, or an address expression. For an address expression, the built-in assembler generates a near pointer--that is, a word that contains the offset part of the address.
The DD directive generates a sequence of double words. Each operand can be a constant expression with a value between -2,147,483,648 and 4,294,967,295, or an address expression. For an address expression, the built-in assembler generates a far pointer--that is, a word that contains the offset part of the address, followed by a word that contains the segment part of the address.
The data generated by the DB, DW, and DD directives is always stored in the code segment, just like the code generated by other built-in assembler statements. To generate uninitialized or initialized data in the data segment, you should use Object Pascal var or const declarations.
Some examples of DB, DW, and DD directives follow.
asm
DB 0FFH { One byte }
DB 0,99 { Two bytes }
DB 'A' { Ord('A') }
DB 'Hello world...',0DH,0AH { String followed by CR/LF }
DB 12,"Delphi" { Object Pascal style string }
DW 0FFFFH { One word }
DW 0,9999 { Two words }
DW 'A' { Same as DB 'A',0 }
DW 'BA' { Same as DB 'A','B' }
DW MyVar { Offset of MyVar }
DW MyProc { Offset of MyProc }
DD 0FFFFFFFFH { One double-word }
DD 0,999999999 { Two double-words }
DD 'A' { Same as DB 'A',0,0,0 }
DD 'DCBA' { Same as DB 'A','B','C','D' }
DD MyVar { Pointer to MyVar }
DD MyProc { Pointer to MyProc }
end;
In Turbo Assembler, when an identifier precedes a DB, DW, or DD directive, it causes the declaration of a byte-, word-, or double-word-sized variable at the location of the directive. For example, Turbo Assembler allows the following:
ByteVar DB ? WordVar DW ? IntVar DD ? ... MOV AL,ByteVar MOV BX,WordVar MOV ECX,IntVar
The built-in assembler doesn't support such variable declarations. The only kind of symbol that can be defined in an inline assembler statement is a label. All variables must be declared using Object Pascal syntax; the preceding construction can be replaced by
var ByteVar: Byte; WordVar: Word; IntVar: Integer; ... asm MOV AL,ByteVar MOV BX,WordVar MOV ECX,IntVar end;
Built-in assembler operands are expressions that consist of constants, registers, symbols, and operators.
Within operands, the following reserved words have predefined meanings
Reserved words always take precedence over user-defined identifiers. For example,
var Ch: Char; ... asm MOV CH, 1 end;
loads 1 into the CH register, not into the Ch variable. To access a user-defined symbol with the same name as a reserved word, you must use the ampersand (&) override operator:
MOV &Ch, 1
It is best to avoid user-defined identifiers with the same names as built-in assembler reserved words.
The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants.
Expressions are built from expression elements and operators, and each expression has an associated expression class and expression type.
The most important difference between Object Pascal expressions and built-in assembler expressions is that assembler expressions must resolve to a constant value--a value that can be computed at compile time. For example, given the declarations
const X = 10; Y = 20; var Z: Integer;
the following is a valid statement.
asm MOV Z,X+Y end;
Because both X and Y are constants, the expression X + Y is a convenient way of writing the constant 30, and the resulting instruction simply moves of the value 30 into the variable Z. But if X and Y are variables--
var X, Y: Integer;
--the built-in assembler cannot compute the value of X + Y at compile time. In this case, to move the sum of X and Y into Z you would use
asm MOV EAX,X ADD EAX,Y MOV Z,EAX end;
In an Object Pascal expression, a variable reference denotes the contents of the variable. But in an assembler expression, a variable reference denotes the address of the variable. In Object Pascal the expression X + 4 (where X is a variable) means the contents of X plus 4, while to the built-in assembler it means the contents of the word at the address four bytes higher than the address of X. So, even though you're allowed to write
asm MOV EAX,X+4 end;
this code doesn't load the value of X plus 4 into AX; instead, it loads the value of a word stored four bytes beyond X. The correct way to add 4 to the contents of X is
asm MOV EAX,X ADD EAX,4 end;
The elements of an expression are constants, registers, and symbols.
The built-in assembler supports two types of constant: numeric constants and string constants.
Numeric constants must be integers, and their values must be between -2,147,483,648 and 4,294,967,295.
By default, numeric constants use decimal notation, but the built-in assembler also supports binary, octal, and hexadecimal. Binary notation is selected by writing a B after the number, octal notation by writing an O after the number, and hexadecimal notation by writing an H after the number or a $ before the number.
Numeric constants must start with one of the digits 0 through 9 or the $ character. When you write a hexadecimal constant using the H suffix, an extra zero is required in front of the number if the first significant digit is one of the digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter.
String constants must be enclosed in single or double quotation marks. Two consecutive quotation marks of the same type as the enclosing quotation marks count as only one character. Here are some examples of string constants:
'Z' 'Delphi' "That's all folks" '"That''s all folks," he said.' '100' '"' "'"
String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as
Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24
where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost characters are assumed to be zero. The following table shows string constants and their numeric values.
The following reserved symbols denote CPU registers:.
When an operand consists solely of a register name, it is called a register operand. All registers can be used as register operands, and some registers can be used in other contexts.
The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. You can also index with all the 32-bit registers--for example, [EAX+ECX], [ESP], and [ESP+EAX+5].
The segment registers (ES, CS, SS, DS, FS, and GS) are supported, but segments are normally not useful in 32-bit applications.
The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(X), where X is a constant between 0 and 7 indicating the distance from the top of the register stack.
The built-in assembler allows you to access almost all Object Pascal identifiers in assembler expressions, including constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the special symbol @Result, which corresponds to the Result variable within the body of a function. For example, the function
function Sum(X, Y: Integer): Integer; begin Result := X + Y; end;
could be written in assembler as
function Sum(X, Y: Integer): Integer; stdcall;
begin
asm
MOV EAX,X
ADD EAX,Y
MOV @Result,EAX
end;
end;
The following symbols cannot be used in asm statements:
The following table summarizes the kinds of symbol that can be used in asm statements.
With optimizations disabled, local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to EBP, and the value of a local variable symbol is its signed offset from EBP. The assembler automatically adds [EBP] in references to local variables. For example, given the declaration
var Count: Integer;
within a function or procedure, the instruction
MOV EAX,Count
assembles into MOV EAX,[EBP-4].
The built-in assembler treats var parameters as a 32-bit pointers, and the size of a var parameter is always 4. The syntax for accessing a var parameter is different from that for accessing a value parameter. To access the contents of a var parameter, you must first load the 32-bit pointer and then access the location it points to. For example,
function Sum(var X, Y: Integer): Integer; stdcall;
begin
asm
MOV EAX,X
MOV EAX,[EAX]
MOV EDX,Y
ADD EAX,[EDX]
MOV @Result,AX
end;
end;
Identifiers can be qualified within asm statements. For example, given the declarations
type
TPoint = record
X, Y: Integer;
end;
TRect = record
A, B: TPoint;
end;
var
P: TPoint;
R: TRect;
the following constructions can be used in an asm statement to access fields.
MOV EAX,P.X MOV EDX,P.Y MOV ECX,R.A.X MOV EBX,R.B.Y
A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of [EDX] into EAX.
MOV EAX,(TRect PTR [EDX]).B.X MOV EAX,TRect(EDX]).B.X MOV EAX,TRect[EDX].B.X MOV EAX,[EDX].TRect.B.X
The built-in assembler divides expressions into three classes: registers, memory references, and immediate values.
An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers.
Expressions that denote memory locations are memory references. Object Pascal's labels, variables, typed constants, procedures, and functions belong to this category.
Expressions that aren't registers and aren't associated with memory locations are immediate values. This group includes Object Pascal's untyped constants and type identifiers.
Immediate values and memory references cause different code to be generated when used as operands. For example,
const
Start = 10;
var
Count: Integer;
...
asm
MOV EAX,Start { MOV EAX,xxxx }
EBX,Count { MOV EBX,[xxxx] }
MOV ECX,[Start] { MOV ECX,[xxxx] }
MOV EDX,OFFSET Count { MOV EDX,xxxx }
end;
Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the brackets convert Start into a memory reference (in this case, the word at offset 10 in the data segment). In the fourth MOV, the OFFSET operator converts Count into an immediate value (the offset of Count in the data segment).
The brackets and OFFSET operator complement each other. The following asm statement produces identical machine code to the first two lines of the previous asm statement.
asm MOV EAX,OFFSET [Start] MOV EBX,[OFFSET Count] end;
Memory references and immediate values are further classified as either relocatable or absolute. Relocation is the process by which the linker assigns absolute addresses to symbols. A relocatable expression denotes a value that requires relocation at link time, while an absolute expression denotes a value that requires no such relocation. Typically, expressions that refer to labels, variables, procedures, or functions are relocatable, since the final address of these symbols is unknown at compile time. Expressions that operate solely on constants are absolute.
The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.
Every built-in assembler expression has a type--or, more correctly, a size, because the assembler regards the type of an expression simply as the size of its memory location. For example, the type of an Integer variable is four, because it occupies 4 bytes. The built-in assembler performs type checking whenever possible, so in the instructions
var QuitFlag: Boolean; OutBufPtr: Word; ... asm MOV AL,QuitFlag MOV BX,OutBufPtr end;
the assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). The instruction
MOV DL,OutBufPtr
produces an error because DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:
MOV DL,BYTE PTR OutBufPtr MOV DL,Byte(OutBufPtr) MOV DL,OutBufPtr.Byte
These MOV instructions all refer to the first (least significant) byte of the OutBufPtr variable.
In some cases, a memory reference is untyped. One example is an immediate value enclosed in square brackets:
MOV AL,[100H] MOV BX,[100H]
The built-in assembler permits both of these instructions, because the expression [100H] has no type--it just means "the contents of address 100H in the data segment," and the type can be determined from the first operand (byte for AL, word for BX). In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast:
INC BYTE PTR [100H] IMUL WORD PTR [100H]
The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Object Pascal types.
The built-in assembler provides a variety of operators. Precedence rules are different from Object Pascal; for example, in an asm statement, AND has lower precedence than the addition and subtraction operators. The following table lists the built-in assembler's expression operators in decreasing order of precedence.
The following table defines the built-in assembler's expression operators.
You can write complete procedures and functions using inline assembler code, without including a begin...end statement. For example,
function LongMul(X, Y: Integer): Longint; asm MOV EAX,X IMUL Y end;
The compiler performs several optimizations on these routines:
PUSH EBP ;Present if Locals <> 0 or Params <> 0 MOV EBP,ESP ;Present if Locals <> 0 or Params <> 0 SUB ESP,Locals ;Present if Locals <> 0 ... MOV ESP,EBP ;Present if Locals <> 0 POP EBP ;Present if Locals <> 0 or Params <> 0 RET Params ;Always present
If locals include variants, long strings, or interfaces, they are initialized to zero but not finalized.
Assembler functions return their results as follows.
pubsweb@inprise.com
Copyright © 1999, Inprise Corporation. All rights reserved.