Down the Rabbit Hole

Posted in programming - Tuesday, July 16, 2013

So here’s what I’ve spent the last couple days working on. Yes, it’s assembly code.

; assemble with nasm:
;     nasm -f elf -g welcome.asm && ld -o welcome welcome.o

%macro print 2
    mov eax, 4  ; sys_write
    mov ebx, 1  ; stdout
    mov ecx, %1  ; address of message
    mov edx, %2  ; length of message
    int 0x80
%endmacro

section .text
    global _start

section .data
prompt db "Hi, what's your name? "
prompt_length equ $ - prompt
welcome_part_1 db "Hello, "
welcome_part_1_length equ $ - welcome_part_1
welcome_part_2 db "! Welcome to assembly.",0x0a
welcome_part_2_length equ $ - welcome_part_2

section .bss
name resb 40
name_max_length equ $ - name
name_length resb 4
extra resb 40
extra_max_length equ $ - name

section .text
_start:
    print prompt, prompt_length

    ; read name
    mov eax, 3  ; sys_read
    mov ebx, 0  ; stdin
    mov ecx, name
    mov edx, name_max_length
    int 0x80

    ; eax is bytes read
    ; if 0 (ctrl-d/EOF), exit
    cmp eax, 0
    jz exit
    ; if max, there may be more input
    cmp eax, name_max_length
    jnz read_complete
    cmp byte [name + eax - 1], 0x0a
    jz read_complete

    ; clear out the rest of the input, or it will be read by the shell as the next command!
clear_input:
    push eax  ; save the name length
    ; read extra
    mov eax, 3  ; sys_read
    mov ebx, 0  ; stdin
    mov ecx, extra
    mov edx, extra_max_length
    int 0x80
    ; if max, there may be more input
    cmp eax, extra_max_length
    jnz input_cleared
    cmp byte [extra + eax - 1], 0x0a
    jz input_cleared
    jmp clear_input
input_cleared:
    pop eax

read_complete:
    ; if last is \n, change it to \0 and decrement length
    cmp byte [name + eax - 1], 0x0a
    jne length_ok
    dec eax
    mov byte [name + eax], 0x00

length_ok:
    cmp eax, 0  ; only if input was \n
    jz _start
    or eax, 0x30  ; convert to ascii
    mov [name_length], eax

    print welcome_part_1, welcome_part_1_length
    print name, name_max_length
    print welcome_part_2, welcome_part_2_length

exit:
    mov eax, 1  ; sys_exit
    mov ebx, 0  ; exit code
    int 0x80

In case you don’t know assembly, and since what this does probably isn’t obvious even if you do, here’s an equivalent shell script.

while test -z "$name" ; do
    read -p "Hi! What's your name? " name
done
echo "Hello, $name! Welcome to bash."

And of course, the assembly program will only work on an Intel chip running Linux, and the bash script will work on anything that runs bash.

So why on earth, you might rightly ask, am I doing this?

Partly, it’s just curiosity. Or perhaps something stronger and more negative, an anxiety aroused by awareness of ignorance. It makes me deeply uncomfortable to have to say, “Oh, that’s a black box; I have no idea how it works.” For the work I’m doing now, it’s no more than that, but if I want to move into security work, it’s much more important because a lot of exploits operate at this level.

I’ve compared learning new programming languages to foreign travel before, and this has been a similar experience. It’s really weird and jarring at first, but I acclimated more quickly than I expected. I’m still very much an assembly newbie, but I’ve crossed some basic threshold of wrapping my head around it.

Assembly is different. You’re dealing with the hardware, shuffling data between registers and memory. You have to pay attention to each byte. And you’re keenly aware when you’re handing off control to the operating system (which is another black box that I’m digging into, on which more at a later date). It stops you from taking a lot for granted. It’s also strangely appealing to be working with such simple tools, at such a fundamental level.

As you can see, it’s a lot of work to do something very basic, but this is what any program ultimately boils down to. The bash script is, under the hood, generating a set of instructions like this (but undoubtedly more complicated). All the python code I write day-in, day-out, blithely schlepping objects around between databases and the web, generates an unimaginable spew of assembly instructions. Working at this level gives you an appreciation for what an amazing structure of code we’ve built on top of this.