I have recently been playing Shenzhen I/O and it reminded me of when I took a course in college where I learned some basic MIPS. I figured it might be fun to learn some more assembly, specifically one that is more popular in use. I figured I would try to make something that sounded "simple", a program in x86-64 that allows a user to input two numbers which are then added together and displayed.
This idea in a higher level language is extremely simple and pretty commonly used when first learning a language or learning programming. The following is an example of what we want to do, written in Rust.
use std::io::{stdin, stdout, Write};
fn main() {
// Define a buffer to hold our user input
let mut input = String::new();
// Get a number from our user
print!("Enter a number: ");
std::io::stdout().flush().unwrap(); // Flush the buffer to output as print! doesn't do that for performance
stdin().read_line(&mut input).expect("Didn't get a string");
let input_without_newline = &input[..input.len() - 1]; // We need to remove the newline at the end of the string
let first_num = input_without_newline.parse::<i32>().unwrap();
// Reset the buffer value and get a second number from our user
input = "".to_owned();
print!("Enter a second number: ");
std::io::stdout().flush().unwrap();
stdin().read_line(&mut input).expect("Didn't get a string");
let input_without_newline = &input[..input.len() - 1];
let second_num = input_without_newline.parse::<i32>().unwrap();
// Add the numbers together and display them
let result = first_num + second_num;
println!("Your added numbers: {}", result);
}
This program prompts a user to enter two numbers and prints their sum. The components of this program which we care about are:
We will be using the NASM assembler. Before writing our adding program, let's make a Hello World so we have a baseline program to build from.
; I am a comment
section .data
msg db "Hello world",13,10
section .text
global _main
_main:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, 13
syscall
jmp .exit
.exit:
mov rax, 60
mov rdi, 0
syscall
This program is made to be run on Linux, assembly is platform specific.
There are two "sections" we care about right now, .data
and .text
.
The data section is where we declare static variables. Here, we define a msg
variable which will hold the string "Hello world"
. The 13,10
portion of our string
corresponds with the ASCII characters for "carriage return" and "line feed". This is akin to writing "Hello world\r\n"
in a language like Python. The db
stands for "define bytes" and allocates bytes to hold our string value.
The text section is where we actually write our program. Our text section begins with global _main
, which defines the entry-point for our program.
Comments in nasm begin with ;
.
Our _main
function contains four mov
instructions, a syscall, and then a jump to another area in the assembly. The mov
instruction takes the format of MOV <destination> <source>
, where we are moving a value from the source (which can be a static value, variable, or another register) into the destination. In the mov rax, 1
instruction we are moving the static value 1
into the rax
register. A register is a tiny piece of memory your CPU can access very quickly. A list of registers can be found in many places on the internet, this Stanford resource being an example.
When working with assembly, we work in a low level space where privilege is an important concept and we need to care about "userspace" and "kernel-space". Our program runs in "userspace" and lacks permissions to do many of the operations we want to do, like write to an output and get user input. We need to ask the OS to do those things for us. A syscall suspends our program and returns control to the OS, which will then examine pre-determined registers for data which tells it what we want it to do.
The rax
register tells the OS what function to run when we make a syscall and registers rdi
, rsi
, and rdx
provide three arguments to that function. Here we move 1 into rax
, which tells the OS that we want to call sys_write
. We can see in the Linux kernel code that sys_write
expects three arguments: a file descriptor, a buffer, and a size. We move 1 into rdi
to indicate our file descriptor (0 is std_input, 1 is std_output, 2 is std_error), our string variable into rsi
, and the size of our string into rdx
. We then make a syscall, which suspends our program and causes the OS to read those four registers, print our string to the screen, and then resume our program.
After we print "Hello world" we jump to the .exit
label and make another syscall. We move 60 into rax
, which tells the OS that we want to exit our program. The exit takes a single argument, the exit code. Our program has run successfully so we will move a 0 into rdi
to indicate a successful status. Once we syscall here, our program ends.
We can run our program by creating an object file and then linking the two to create a binary. The below will create an elf64 binary file named hello
from a hello.asm
assembly program which you can run on Linux:
nasm -f elf64 -o hello.o hello.asm
ld -o hello hello.o
Now we want to translate the Rust code from earlier into an assembly program.
We are going to define a variable to use as a buffer for our user input, but we are going to use a new section to define it. In addition to the data and text sections, we can also define a "block starting symbol" section where we can declare a variable which has not been assigned a value. We are going to define a buffer which can hold 64 bytes, an arbitrary and large size.
section .bss
read_buffer resb 64
This is the easy bit, as we already figured out how to do this in our "Hello, World!" example. Here we can define two prompt strings in our data section and then print to the screen in our text section.
section .data
first_msg db "Enter first number: "
first_msg_len equ $ - first_msg ; This variable is going to store the length of our string, for ease of use
second_msg db "Enter second number: "
second_msg_len equ $ - second_msg
section .text
global _start
_start:
; Ask our user to enter a number
mov rax, 1
mov rdi, 1
mov rsi, first_msg
mov rdx, first_msg_len
syscall
; We will get user input here later
; Repeat the above with our second_message variables
; ...
We are going to make a new function to perform this action to aid both in readability and to allow us to re-use it. We are also going to use a new syscall with code 0, sys_read
. sys_read
takes three arguments, a file descriptor (see above), a buffer, and a buffer size. sys_read
also has a return value which we care about, the number of bytes written. When a syscall returns a value, that value will be placed in the rax
register by the OS before control is returned to our program.
;...
_start:
; ...
call _read_number
; ...
_read_number:
; Function prologue
push rbp
mov rbp, rsp
; call "sys_read"
mov rax, 0
mov rdi, 0 ; file_descriptor 0 is std_input
mov rsi, read_buffer
mov rdx, 20 ; We are giving an arbitrarily large buffer size here
syscall
; Convert our text into a number and store it in r12
; This will be explained in the next section
call _read_buffer_to_number
; Function epilogue
mov rsp, rbp
pop rbp
ret
The call
and ret
instructions here manage a pointer which tells the CPU where it is executing in the program. When we call
, the address of the next instruction is stored on the stack so we know where to come back to and resume execution when our function finishes. When we call ret
, the top value on the stack is read and replaces the instruction pointer. It is typical when writing a function in assembly to include a "prologue" and "epilogue" before and after your functions main code so, if your function uses the stack, you don't cause a problem when ret
tries to grab the address left for it by call
. This program is not using the stack, but since the purpose of this exercise is to learn, the prologue/epilogue are included anyway.
After we syscall, the OS will fill our read_buffer
with whatever the user inputs and store the number of bytes written into rax.
If we try to do any math with the number input by a user here, the output will not make sense. If our user enters 1
then the integer value stored in our read_buffer
is going to be 2609
in decimal, or 0x0A31
in hex. On a little endian machine like mine this corresponds to the ASCII value for "1" (49
in decimal, 0x31
in hex) followed by a line feed (10
in decimal, 0xA
in hex). In order to treat the input like a number so we can do math with it, we need to remove the linefeed and convert the ASCII number into an integer number.
The process of converting our digits is two-fold. First, we first must take an ASCII digit and subtract 48 from it. Since the ASCII value of "1" is 49, subtracting 48 will bring the value to what we expect it to be. Second, we need to accumulate our digits. If our user enters a number with more than one digit, like 105, then every time we read a digit we must multiply our accumulated number by 10 before adding a digit.
The following was my first pass at this, but it contains a bug that prevents it from working as expected.
_read_buffer_to_number:
; rax here is the number of bytes that we just wrote to read_buffer
mov r12, 0 ; Our accumulator
mov r10, 0 ; Loop counter
.loop:
; The instruction below this comment causes a bug
mov r9, [read_buffer + r10] ; mov char from read_buffer into a register
sub r9, 48 ; subtract 48 (converting ASCII into decimal)
; fold the number into an accumulator
; acc = acc x 10
; acc = acc + new_char
imul r12, 10
add r12, r9
add r10, 1 ; Increment loop counter
mov r11, r10 ; Set a temporary register we will use for comparison
add r11, 1 ; Add 1 to tmp register, this is to account for the newline we want to ignore from our input
cmp r11, rax ; Compare how many digits we have read to rax, which holds how many characters were written
jl .loop ; Loop again if we have not read all digits
; r12, the accumulator, is our return value
ret
The snippet above contains a loop
label representing a loop, the r12
register representing the number we are accumulating (and our output), and a loop counter stored in the r10
register. In our loop we move data out of the read_buffer
and into the r9
register, subtract 48 to convert the value from an ASCII digit to a number, and fold the number into our accumulator. We then increment our loop counter and compare it to rax
which contains the number of bytes written by the user in the previous syscall. If our loop counter is less than the number of bytes written, we jump to the .loop
label to read another digit.
The _read_buffer_to_number
function is very close to functional but contains a bug that stops it from doing what we expect it to. This is a good time to learn how to debug an assembly program. Higher level languages have very convenient debuggers built into IDEs and the classic print("I am here")
if one does not want to use a debugger. Neither of these are viable strategies in our assembler. We are going to use gdb
to debug. To start, we need to generate debug symbols. We're going to change how we were generating our binary file like so:
nasm -f elf64 -gdwarf -o add.o add.asm
ld -o add add.o
Next, run gdb
with the binary as an argument:
gdb ./add
We want to debug the _read_buffer_to_number function, so add a breakpoint like:
(gdb) b _read_buffer_to_number
Then enter "run" to begin execution of the program. After we enter our number, the breakpoint will trigger. Entering "next" will cause the debugger to step forward by one instruction, we will step until we run the mov r9, [read_buffer + r10]
instruction. After that instruction is run we can enter "info registers" to display all of our registers and the values they currently contain. If we run that here we can see that r9
contains the full user input value (0xa31
in hex if the user entered "1"). This is not what we want and will cause unexpected behavior, we want one byte to be read at a time and we do not want the line feed byte to be included.
The mov
instruction in nasm will move the entire value from the source into the destination. We can replace it with a more specific move instruction, movzx
, and specify that we only want to move a byte. Our move instruction will now look like:
movzx r9, byte [read_buffer + r10]
The "zx" portion of the instruction stands for "zero extend". A single byte will be moved and any remaining space will be filled with zeros to "extend" it to be a word.
After we have two numbers, adding them together is a simple task. We will use a variable to store our sum.
section .data
; ...
sum: dq 0
section .text
; Enter _start, prompt the user to enter a number, call _read which itself calls _read_buffer_to_number
; _read_buffer_to_number leaves our user input number in r12
add [sum], r12
; Repeat for the second input number
Above, we are using a sum
variable to store the sum of the user input numbers. It is initialized to be 0 and then both times a number is entered, that numer is added to sum
. The brackets in the add
instruction act to dereference the variable and add to the value it holds.
Once we have our sum, the number must be converted into a string so it can be displayed to the user.
_int_to_string:
; Function prologue
push rbp
mov rbp, rsp
; Input is in r12
mov r10, 0 ; Loop counter
.div_loop:
mov rax, r12 ; Move the lower 32 bits of the dividend into rax
mov rdx, 0 ; Clear RDX (0 out the upper 32 bits of the dividend)
mov rbx, 10 ; Move the divisor into rbx
div rbx ; Divide rdx:rax by rbx
mov r12, rax ; Replace r12 with the quotient
mov r11, rdx ; Place our remainder in r11
add r11, 48 ; Convert our int into an ASCII number
push r11 ; Push our digit onto the stack. We push/pop the digits so our string reads in the correct order
add r10, 1 ; increment counter
; If r12 is greater than 0 then we still have more digits to read
cmp r12, 0
jg .div_loop
; Pop the digits off of the stack
mov r11, 0 ; Second loop counter
.pop_loop:
pop r13
mov [read_buffer + r11], r13b ; Move the least significant byte from r13 into the read buffer
add r11, 1 ; increment counter
cmp r11, r10 ; compare the loop counter against the number of digits that were stored on the stack
jl .pop_loop ; If our loop counter is smaller, we have more digits to pop
; Add a line feed to read_buffer
mov [read_buffer + r11], byte 10
add r10, 1 ; Increment the number of characters we wrote by one to account for the line feed
; Set r12 to be our string output
mov r12, read_buffer
; Set r13 to be the number of characters we wrote to the string
mov r13, r10
; Function epilogue
mov rsp, rbp
pop rbp
ret
This function takes a number as an input in the r12
register. The first loop divides the value in r12
by 10 to pop off the rightmost digit, converts it to an ASCII representation, and then pushes it onto the stack. If our remaining value is greater than 0, then we still have digits to read and the loop repeats. The second loop pops our ASCII digits off of the stack and into our string buffer. A line feed is then added and we return our string in the r12
register and its length in the r13
register (the string is also "returned" into the buffer).
The process of pushing and then popping our digits to and from the stack ensures that the number is built in the correct order. The first loop, which gets each digit and converts them, reads the digits from back to front. This causes our number to appear backwards. Since the stack is LIFO, pushing our digits to the stack as we read them and then taking the digits out when we are building the string will reverse their order into being correct.
After running this function, the r12
and r13
registers contain our string and its length. Supplying these for a std_write
syscall finishes our program and displays the result to the user.
The adding program now works, with some limitations. It cannot add negative numbers or non-whole numbers, we would need to check for "-" and "." ASCII characters and handle them accordingly.
I began this program with the intent on just adding two numbers together but ended up spending all of my time writing it on type conversion. Working without a standard library shows how convenient it is that most languages come with the batteries included and you don't need to worry about converting a string to a number beyond my_string.parse::<i32>()
.