Down the Rabbit Hole
Posted in programming -So here’s what I’ve spent the last couple days working on. Yes, it’s assembly code.
; assemble with nasm:
; nasm -f elf -g welcome.asm && ld -o welcome welcome.o
%macro print 2
mov eax, 4 ; sys_write
mov ebx, 1 ; stdout
mov ecx, %1 ; address of message
mov edx, %2 ; length of message
int 0x80
%endmacro
section .text
global _start
section .data
prompt db "Hi, what's your name? "
prompt_length equ $ - prompt
welcome_part_1 db "Hello, "
welcome_part_1_length equ $ - welcome_part_1
welcome_part_2 db "! Welcome to assembly.",0x0a
welcome_part_2_length equ $ - welcome_part_2
section .bss
name resb 40
name_max_length equ $ - name
name_length resb 4
extra resb 40
extra_max_length equ $ - name
section .text
_start:
print prompt, prompt_length
; read name
mov eax, 3 ; sys_read
mov ebx, 0 ; stdin
mov ecx, name
mov edx, name_max_length
int 0x80
; eax is bytes read
; if 0 (ctrl-d/EOF), exit
cmp eax, 0
jz exit
; if max, there may be more input
cmp eax, name_max_length
jnz read_complete
cmp byte [name + eax - 1], 0x0a
jz read_complete
; clear out the rest of the input, or it will be read by the shell as the next command!
clear_input:
push eax ; save the name length
; read extra
mov eax, 3 ; sys_read
mov ebx, 0 ; stdin
mov ecx, extra
mov edx, extra_max_length
int 0x80
; if max, there may be more input
cmp eax, extra_max_length
jnz input_cleared
cmp byte [extra + eax - 1], 0x0a
jz input_cleared
jmp clear_input
input_cleared:
pop eax
read_complete:
; if last is \n, change it to \0 and decrement length
cmp byte [name + eax - 1], 0x0a
jne length_ok
dec eax
mov byte [name + eax], 0x00
length_ok:
cmp eax, 0 ; only if input was \n
jz _start
or eax, 0x30 ; convert to ascii
mov [name_length], eax
print welcome_part_1, welcome_part_1_length
print name, name_max_length
print welcome_part_2, welcome_part_2_length
exit:
mov eax, 1 ; sys_exit
mov ebx, 0 ; exit code
int 0x80
In case you don’t know assembly, and since what this does probably isn’t obvious even if you do, here’s an equivalent shell script.
while test -z "$name" ; do
read -p "Hi! What's your name? " name
done
echo "Hello, $name! Welcome to bash."
And of course, the assembly program will only work on an Intel chip running Linux, and the bash script will work on anything that runs bash.
So why on earth, you might rightly ask, am I doing this?
Partly, it’s just curiosity. Or perhaps something stronger and more negative, an anxiety aroused by awareness of ignorance. It makes me deeply uncomfortable to have to say, “Oh, that’s a black box; I have no idea how it works.” For the work I’m doing now, it’s no more than that, but if I want to move into security work, it’s much more important because a lot of exploits operate at this level.
I’ve compared learning new programming languages to foreign travel before, and this has been a similar experience. It’s really weird and jarring at first, but I acclimated more quickly than I expected. I’m still very much an assembly newbie, but I’ve crossed some basic threshold of wrapping my head around it.
Assembly is different. You’re dealing with the hardware, shuffling data between registers and memory. You have to pay attention to each byte. And you’re keenly aware when you’re handing off control to the operating system (which is another black box that I’m digging into, on which more at a later date). It stops you from taking a lot for granted. It’s also strangely appealing to be working with such simple tools, at such a fundamental level.
As you can see, it’s a lot of work to do something very basic, but this is what any program ultimately boils down to. The bash script is, under the hood, generating a set of instructions like this (but undoubtedly more complicated). All the python code I write day-in, day-out, blithely schlepping objects around between databases and the web, generates an unimaginable spew of assembly instructions. Working at this level gives you an appreciation for what an amazing structure of code we’ve built on top of this.