CS2650 Assembler Basics

 

In this course we'll be using the freeware assembler NASM to complete the labs.

The NASM assembler, along with a sample program and .exe emulator macro can be downloaded at http://rbhilton.com/wsucs/cs2650/CS2650NASM.ZIP

 

This file is a ZIP file. The files in it will need to be extracted to a common folder

 

There are four files that will be extracted:

  1. NASMW.EXE this is the assembler program it will take your source file and assemble it into an executable binary file.
    The syntax for assembling your programs is:
    NASMW MYPROG1.ASM -oMYPROG1.COM

  2. LICENSE this is the freeware license file. It must accompany any downloaded or copied NASM assembler.

  3. EXEBIN.MAC this is an assembler macro file. It is included in each of your compiles to allow the assembler binary to be created in an executable form.

  4. SAMPLE1.ASM is a sample program that shows the general format and syntax of the NASM assembler programs we will be writing this semester.

The sample program (SAMPLE1.ASM) that is included is intended to give students an example of a working assembler program. It can be assembled using the syntax in item 1 above and then executed to insure that the assembler is being executed correctly.

You can use virtually any text editor to create and modify your assembler source programs. Notepad works well, except that it does not allow for the referencing of line numbers. Many students download and use editors such as Notepad++ which work well with assembler source files.

Following is an explanation of the program including an explanation of the basic assembler syntax used in it.

 

Each program you write should begin with the header:

;NASM Assembler Sample Program
;
[BITS 16] ;Set code generation to 16 bit mode
%include 'exebin.mac' ;include file to simulate .com header

EXE_Begin ;assembler directive to indicate executable begin

[ORG 100H] ;set addressing to begin at 100.

 

  • Comments begin with a semi-colon. Comments may be placed anywhere on the line.Everything after a semi-colon will be considered to be a comment. You will be expected to comment each of your programs with at least your name, course and date.
  • The statement [BITS 16] is an assembler directive and instructs the assembler to assemble the source code into a 16 bit binary object file
  • The statement %include exebin.mac causes the file exebin.mac, which needs to be located in the same subdirectory as your source and the assembler, to be inserted into the source. It allows the assembled program to be created as an executable binary file, without the use of a linker.
  • The statement EXE_Begin at the beginning and the EXE_End at the end serve to define the boundary of the assembly.

  • The assembler directive, [ORG 100H] instructs the assembler to begin addressing at 100H. This allows for the 256 byte program prefix defined in the include file to be addressed correctly.

Between the EXE_Begin and EXE_End, you enter the actual assembler code to accomplish the lab requirements.

 

NASM Assembler Specifics

Like most assemblers, each NASM source line contains some combination of the four fields

label:  instruction operands ; comment

As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.

NASM places no restrictions on white space within a line: labels may have white space before them, or instructions may have no space before them, or anything. The colon after a label is also optional. (Note that this means that if you intend to code lodsb alone on a line, and type lodab by accident, then that's still a valid source line which does nothing but define a label. Running NASM with the command-line option -w+orphan-labels will cause it to warn you if you define a label alone on a line without a trailing colon.)

Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, _ and ?.

The instruction field may contain any machine instruction:

DB, DW, DD, DQ and DT are used to declare initialized data in the output file. They can be invoked in a wide range of ways:

db 0x55  ; just the byte 0x55 
db 0x55,0x56,0x57 ; three bytes in succession 
db 'a',0x55  ; character constants are OK 
db 'hello',13,10,'$'   ; so are string constants 
dw 0x1234  ; 0x34 0x12 
dw 'a'  ; 0x41 0x00 (it's just a number) 
dw 'ab'  ; 0x41 0x42 (character constant) 
dw 'abc' ; 0x41 0x42 0x43 0x00 (string) 
dd 0x12345678  ; 0x78 0x56 0x34 0x12 
dd 1.234567e20  ; floating-point constant 
dq 1.234567e20  ; double-precision float 
dt 1.234567e20 ; extended-precision float

The TIMES prefix can be used to cause the instruction to be assembled multiple times. For example:
zerobuf:  times 64 db 0
This would allocate and initialize 64 bytes to 0x00h beginning at address zerobuf.
 
Normal NASM instructions follow a format similar to the first section of code in the sample program:
cls:    mov ah,06
        mov cx,0000
        mov dx,184fH
        mov al,00
        mov bh,1fH
        int 10H 
 

 

The label cls: is an address reference point. It can be used as a target in a jump or call instruction. In this sample program it is not used.

The mov instructions are used to copy (note the source, or right side operand is preserved) from literal to register, literal to memory, register to register, register to memory or memory to register. All literals in NASM are taken to be decimal unless indicated otherwise. Note that the half register designators such as AH and BH are legal to move to and from.

 

The int instruction causes a call to an interrupt vector referenced by the interrupt number. In this case int 10h makes a BIOS call to interrupt 10h, which is a video interrupt.

 

In the final executable instruction,
stop:   int 20H

 

The label stop: gives an address reference point.  The INT instruction calls interrupt 20H
 

IMPORTANT NOTE: *Every* program you write MUST have INT 20H as the last instruction. INT 20H calls the terminate routine and stops program execution. If you don't have the INT 20H, your program will continue to run, executing whatever comes after your program in memory. Although normally this will just cause your DOS window, or Windows in general to crash or freeze up, it could erase your hard drive.
MAKE SURE INT 20H is the last executable instruction in every program.

 

You should save your source file when you begin it, during your entry and editing of it and after you are finished. When you are finished, assemble it using the syntax:

NASMW CS2650P1.ASM -oCS2650P1.COM

 

If there are no errors, you will just be returned to the DOS prompt. Errors will reference a line number. You can then go to your source, to that line number and fix the problem, then re-assemble.