MASM vs. NASM

Notice that NASM directives are much different than MASM ones. While the NASM help files provide very detailed information about all of the NASM directives as well as the macro system, here's a short overview of most of the ones used here:

%include This preprocessor command works just like #include in C.  It just inserts the file you tell it into your code in it's place.
BITS 32 This directive tells NASM this is a 32-bit assembly file.
SECTION This directive tells NASM what section everything between it and the next SECTION directive will go into. There are three sections most commonly used:
  .bss: Uninitialized data (similar to dw(?) in MASM)
  .data: Initialized data
  .text: Program code
resb, resw, resd The uninitialized versions of db, dw, and dd.
times NASM version of MASM's dup.

NASM Is Case-Sensitive
One simple difference is that NASM is case-sensitive. It makes a difference whether you call your label foo, Foo or FOO.

Uninitialized (.bss) Section
Sure MASM could have uninitialized variables with "dw ?" but this was totally stupid because everyone knew it started out with a value of zero.  Why have an uninitialized section?  Let me Answer that question with another:  Where are the initial values for the initialized variables stored?  They must be assembled and linked into the executable, and so they must take up space on disk there.  In MASM, if you declare a bunch of data, it will devote part of the executable to that data.  All those BufferSegs and FontSegs and ScratchSegs and everyting else all were included in the .exe file.  This is pretty stupid because there's no reason 64k of zeros need to be stored in the .exe file.  The situation is even worse in protected mode where you'll want to declare 4MB buffers for high-resolution images.  Not only does this data take up space on disk, but it takes forever to load it all into memory.

So back to the uninitialized Section.  When the protected-mode program gets loaded, the loader reads all the initialized data right from the file into memory.  It also reads one number telling it how much uninitialized data there was and allocates that much space for it in memory.  Much smarter, eh?  If a variable doesn't need an initial value, put it in the uninitialized section.

Why is the code section called .text and the uninitiated section called .bss?  Standard UNIX / C names for things don't make a whole lot of sense:  A guy asks his wife, "Why do you always cut the ends off your pot roast?"  She responds, "Because that's what my mother always did."  He asks her mother who replies, "That's what my mother always did."  Luckily enough the grandma is still alive and when the guy asks her, she laughs and replies, "Because I never had a pot big enough for it."

Labels are Everything.
This was true in MASM too, but people for some reason believe that the "proc" stuff was magical.  NASM has no build in proc, so everything is a label.  A subroutine can be nothing but a label that you "call" to instead of "jmp" to.  Of course the call will push the return address on the stack, and the "ret" a the end of your subroutine will pop it, so you can't just "jmp" to your subroutine or it will return to some random value that just happened to be on the stack.

Also, you may notice the widespread use of .label's. These are "local" labels that are local to the last label without a dot before it.  This is so you don't have to worry about having two labels with the same name. You also don't need a colon after the label name, but it has to appear on a line by itself.

NASM Requires Square Brackets For ALL Memory References
NASM was designed with simplicity of syntax in mind. One of the design goals of NASM is that it should be possible, as far as is practical, for the user to look at a single line of NASM code and tell what opcode is generated by it. You can't do this in MASM.  If you declare, for example,

foo     equ 1
bar     dw 2

mov ax, foo   ; MASM spits out mov ax, 1
mov ax, bar   ; MASM spits out mov ax, word ds:[12]  
              ; if offset bar happens to be 12.

The bottom two lines generate completely different opcodes, despite having identical-looking syntaxes.

NASM avoids this undesirable situation by having a much simpler syntax for memory references. The rule is simply that any access to the contents of a memory location requires square brackets around the address, and any access to the address of a variable doesn't. So an instruction of the form mov ax, foo will always refer to a compile-time constant, whether it's an EQU or the address of a variable; and to access the contents of the variable bar, you must code mov ax, [bar]. This sounds nice here, and it is, but You will forget this when you code!

This also means that NASM has no need for MASM's OFFSET keyword, since the MASM code mov ax, offset bar means exactly the same thing as NASM's mov ax, bar.

MASM Syntax NASM Syntax
bob db  12h

mov dx, offset bob  
      ; 16-bit, so address is 16-bit
mov al, bob
   ; MASM remembers variable types,
   ; so you don't need "byte ptr" 
bob db  12h

mov edx, bob  
    ; 32-bit address.  No offset keyword
mov al, byte [bob]
    ; Memory accesses require []'s.
    ; You need the "byte" keyword always.

NASM, in the interests of simplicity, also does not support the hybrid syntaxes supported by MASM and its clones, such as mov ax, table[bx], where a memory reference is denoted by one portion outside square brackets and another portion inside. The correct syntax for the above is "mov ax, [table+bx]"  This is what it looks like when you disassemble MASM generated objects anyway, right?  Likewise, "mov ax, es:[di]" is wrong and "mov ax, [es:di]" is right.

It's very important to understand how NASM works.  If nothing else shows up about protected mode on an exam, this will:

NASM Doesn't Store Variable Types
NASM, by design, chooses not to remember the types of variables you declare. Whereas MASM will remember, on seeing var dw 0, that you declared var as a word-size variable, and will then be able to fill in the ambiguity in the size of the instruction mov var, 2, NASM will deliberately remember nothing about the symbol var except where it begins, and so you must explicitly code mov word [var], 2.  Again, this has to do with the ability to look at any single line of NASM code and know exactly what is going on.

For this reason, NASM doesn't support the LODS, MOVS, STOS, SCAS, CMPS, INS, or OUTS instructions, but only supports the forms such as LODSB, MOVSW, and SCASD, which explicitly specify the size of the components of the strings being manipulated.

NASM Doesn't ASSUME
As part of NASM's drive for simplicity, it also does not support the ASSUME directive. NASM will not keep track of what values you choose to put in your segment registers, and will never automatically generate a segment override prefix.

Floating-Point Differences
NASM uses different names to refer to floating point registers from MASM: where MASM would call them ST(0), ST(1) and so on, NASM chooses to call them st0, st1 etc.

NASM does not declare uninitialized storage in the same way as MASM: where a MASM programmer might use stack db 64 dup (?), NASM requires stack resb 64, intended to be read as `reserve 64 bytes'.

How this Effects 291
NASM syntax is much better than MASM syntax.  It we had started you out with NASM, variables and such would have made much more sense.  You aren't used to NASM, though.  You will have problems.  You will try to move the contents of a variable without []'s and it will move the variable's offset instead.  Read this section many times.  If something doesn't seem to work, read this section.  Before you call over a TA, read this section.  Remember both syntaxes for your exam coding questions -- we'll tell you which to use.

Task 3: A More Interesting Program

Type this into a "var.asm" file:  Actually type it in or you won't "feel" the different syntax.  I know this sounds lame, but was either this or I make you write another Parse!  (Be sure to at least look at it before you build it!)

BITS 32 ; Tell NASM we're using 32-bit protected mode.

GLOBAL _main   ; Tells the linker about the label called _main
               ; This is because crt0.o references _main and must
               ; be linked to it.


SECTION .bss   ; Uninitialized Data Section

    Mike  resw 2 ; Reserve 2 words (2*2=4 bytes)
    Pete  resd 3 ; Reserve 3 doubles (3*4=12 bytes)

SECTION .data  ; Initialized Data Section

    Bob_Constant equ 87654321h 
      ; Assembly-style hex constant.
      ; Remember that when this is assembled, NASM just does a
      ; "search and replace", replacing "Bob_Constant" with 
      ; 87654321h. This is not a variable. No memory is 
      ; allocated. It has no offset. You can't change it's value.


    Jim_Double dd 0x12345678
      ; C-style hex constant put in variable.
      ; This is the first variable we've declared in this section,
      ; so it will have the first address. Since this is a variable
      ; instead of a constant, NASM will "search and replace" 
      ; Jim_Double by it's offset.


      ; This is where things get a bit more tricky than with MASM.
      ; In MASM, this address would be an offset relative to some
      ; segment. Since this was the first variable declared in
      ; the .data section, in MASM it would have an offset 0
      ; and the "search and replace" would replace Jim_Double with
      ; the constant 0.

      ; Now, we're going all out and useing C-style object files and linkers
      ; The addresses of variables aren't determined at assembly time, but
      ; are actually set to a constant at link-time. You will see this later.


    _Viewable_Variable dd 0xBADBEEF 
      ; C compilers put an underscore before all the names
      ; when they "blindly output" their object files.
      ; If you want your routines to be callable by C, you 
      ; have to put an underscore before them yourself.


    Jason_String db 'Hi guys!', 0 
      ; C-style zero-terminated string.
      ; Remember that NASM handles variables a bit differently.
      ; mov eax, Jason_String        -- eax <= 0x8234 (some address)
      ; mov al, byte [Jason_String]  -- al <= 'H'


    Dollar equ $ 
      ; $ means the current offset into the output file
      ; that NASM "blindly types out." (It's the $th character
      ; NASM has "typed" into the object file.)


    Jason_String_len equ $-Jason_String 
      ; Dollar is really pretty useless by itself, but to 
      ; declare a constant for the length of the string (instead of
      ; just hard coding 9) you subtract the offset of the current output
      ; position from the offset of Jason_String's output position. 


SECTION .text ; Says that this is the start of the code section.

_main: ; Code execution will start at the label called _main

    mov eax, 42              ; As simple as could be.
    mov ebx, Dollar          ; Gets replaced by an assemble-time constant.
    mov ecx, Jason_String    ; Moves the address of Jason_String (a constant set by the linker)
    mov edx, Jason_String_len  ; Gets replaced by assemble-time constant.

    mov eax, Pete   ; The address of Pete
    mov ebx, [Pete] ; This is supposed to be uninitialized.

    mov esi, 0
  .Jason_Loop ; It's a local label with the "." and with no ending colon. It works. 
    mov al, byte [Jason_String + esi] ; Like MASM's Jason_String[esi]
    inc esi
    cmp esi, Jason_String_len
    jb .Jason_Loop

    ret ; Return to DJGPP's crt0 library startup code

Go back and do the "nasm" "ld" "objdump" "stubify" and "cv32" steps for "var.asm"  Look at where all the variables and labels are in the object dump.  Question 4:  How does NASM treat labels with a dot before them to keep those local names form interfering with each other?  It renames them in order to keep all the names unique.  How does it rename them?

Task 4: Setting up a Makefile

Typing in all those commands every time would be a pain, so smart people started putting all those commands in a script so you only had to type them in once and just run the script.  Even smarter came up with this concept of Makefiles.  Makefiles have rules that tell how to make different types of files.  You want to make a .o file?  Well you need to run NASM on the .asm file.  You want to make a .exe file?  Well you have to link a bunch of .o files together and stubify the sucker.  The other big thing about make files is that they do dependency checking.  You want to make the bob.o file?  That file requires the bob.asm file, so let's check to see if the bob.asm file is newer than our current bob.o file.  If it's not, we don't have to remake it.  This way you only remake the things that have changed.  The Makefiles we will use are standard UNIX GCC Makefiles, which is something that is just good to know about.  When you type "make," the make program will look in the "Makefile."  We want the Makefile to do what we did up there, only better.

The Makefile used by DJGPP is in the GNU (Unix) format, which among other things requires command lines to start with tabs, so if you edit the Makefile, be sure to use a text editor which preserves tabs.  WinEdit does not do this, so you have to use something like Notepad.  (Gasp.)

In your tutorial folder, make a file called "Makefile" (with no extension) and type this in it.  Notepad will automatically stick a .txt extension on your file, so be sure to take it off.  Remember, those have to be TABS, not spaces!  Also, the ld line is too long to fit on one line in the browser, so it probably wraps, but has to be all on one line in your Makefile.

all: var.exe

clean:
        rm -f *.o var.exe *.lst *.map

var.exe: var.o
        ld -o var --cref -Map var.map $(DJDIR)/lib/crt0.o var.o -L$(DJDIR)/lib -L$(DJDIR)/lib/gcc-lib/djgpp/2.81 -L$(DJDIR)/bin -Tdjgpp.djl -lgcc -lc -lgcc
        stubify var

%.o: %.asm
        nasm -f coff -iinclude/ $< -l $*.lst

Here is how a Makefile works: