It’s Simpler Than You Think. It’s More Complex Than You Think

I’ve been doing web development for a pretty long time, but just over the last few years I’ve come to really appreciate the fundamentals of HTTP, what’s going on under the hood when I’m building web applications. There are two sides of this. The first is that HTTP is in one sense a very simple protocol. It’s just little text messages going back and forth between your browser and the web server. Whether I’m using Node or Django or some huge WSDL-driven Java XML-Beans monstrosity, what it’s doing isn’t rocket science; it’s just taking care of a bunch of tedious, nit-picky bookkeeping that I don’t want to be bothered with. If I really wanted to, I could just type the messages myself (and we’ll get to that in a minute).

The practical upside of that is that you can use really simple tools to debug big, hairy, complex web applications. A few years ago, I was working in one of those Big Web Services systems with WSDL files and auto-generated Java code and layers and layers of middleware. We’d get some kind of error at the front end, and it’d be really hard to tell which piece had broken. So I ended up writing a bunch of really simple shell scripts to test the web services in isolation. I’d spackle together something using curl, grep, and sed that built up and picked apart the messages as text, without dragging in all that mess of Java code.

The flip side is that HTTP is actually a richer protocol than I’d realized. There’s a lot I didn’t know about it until I started building RESTful web services and trying to understand the “right” way to do it. There’s all this stuff you can do with status codes and headers that I’d been re-implementing at the application level.

To take a recent example, I’ve been working on a web service that talks to other web services. Someone would make a call to us, we’d call the back-end services, they’d time out or barf up some sort of error, and we’d pass back a 500 error to our client. They’d see it and email us asking what was wrong with our service. It’d be nice to let them know it’s not our fault and that they should pester the back-end systems people instead. We could send back a message body that says something like, “Back-end systems failure. Original error message follows,” but it turns out we can say that just by returning a different status code. Not only is there a 502 status code, which means that a back-end system failed, but there’s also a 504, which means that we timed out trying to contact it. That tells our client that they can try again in a little while and the request might go through.

Ok, enough talking. Now code.

Goin’ all Mechanical Turk on this

To illustrate the first point, that this is all just text, I’m going to play human web server, using netcat. If you’re not familiar with it, it’s a standard unix utility that just opens a network connection. Anything you type gets sent along it; anything that comes back gets printed out on your screen. I open up a terminal and type:

1
nc -l 3333

That starts up netcat listening on port 3333. Then I switch to my browser and tell it to go to http://localhost:3333/. The “page loading” indicator starts spinning. In the netcat terminal, I see:

1
2
3
4
5
6
7
8
GET / HTTP/1.1
Host: localhost:3333
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8

That’s Chrome telling me it wants the root resource (/, which Apache or whatever would normally interpret as index.html). It’s also telling me a lot about what kind of response it can handle. I’m going to ignore all that for now and just type:

1
2
3
4
5
HTTP/1.1 200 OK 
Content-type: text/plain
Content-length: 7

Hello!

Pretty straightforward. The content length is 7 because it includes the return character after “Hello!” Here’s what we see in Chrome:

Switch back to the browser and go to http://localhost:3333/index.html. In the netcat terminal, we get a request that’s much the same as before, except the first line is:

1
GET /index.html HTTP/1.1

Since they asked for HTML, I’ll give them HTML. I type:

1
2
3
4
5
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 16

<h1>Hello!</h1>

And in Chrome we see:

So at some fundamental level, that’s all a web application is. It’s a program that listens for a connection, gets little text messages, interprets them, and sends back responses. How simple can we make that?

RESTful Web Services in Bash

How about this?

1
2
3
4
5
#!/bin/bash

echo "Content-type:text/plain"
echo 
uptime

uptime is a standard unix utility that reports how long the computer has been running and what the 1, 5, and 15 minute system load averages are. That’s marginally useful - I’ve actually used a script much like this for basic server monitoring. Put it in a file, make it executable, run it from the command line, and it’ll spit out something like:

1
2
3
Content-type:text/plain

 21:29:32 up 9 days, 15:17,  5 users,  load average: 0.05, 0.10, 0.18

From here, if you want to follow along, you’ll need to have Apache set up and configured to let you run CGI scripts in the directory you’re working in. (That’s a whole tutorial on its own, but here’s some instructions for Mac OS X. Otherwise, Google for “apache enable cgi” and your operating system.)

On my machine, this script is saved as public_html/api/v1/load/index.cgi. That lets me access it as http://localhost/~colin/api/v1/load/, as we can see in Chrome:

We can also use netcat in place of Chrome. Instead of listening on a port, we open a connection to the web server’s port:

1
$ netcat localhost 80

Then I type:

1
2
GET /~colin/api/v1/load/ HTTP/1.1
Host: localhost

And I get this back from Apache:

1
2
3
4
5
6
7
8
9
10
11
HTTP/1.1 200 OK
Date: Wed, 27 Aug 2014 01:14:11 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/plain

46
 21:14:19 up 9 days, 15:01,  5 users,  load average: 0.08, 0.18, 0.28

0

You can see that Apache includes a bunch of header fields that I didn’t bother to when I was playing web server. (I’ll trim most of these out of later examples to cut down on the clutter.) The more interesting thing is that it doesn’t have a Content-length header. What it has instead is Transfer-Encoding: chunked. That says that its content will be in chunks, prefixed by their size (in hexadecimal). 46 hex is 70, which is the length of the next line (again, counting the return character at the end). The ‘0’ for the next chunk says, “that’s all, folks!”

We can make this a little easier on ourselves by using curl instead of netcat. It’s a somewhat more custom tool for making HTTP requests. We can just run curl -si http://localhost/~colin/api/v1/load/ from the command line, and get back:

1
2
3
4
5
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/plain

 22:05:42 up 9 days, 15:53,  5 users,  load average: 0.14, 0.13, 0.19

That’s the same as what netcat gave us (minus the header clutter), but notice that it combined the chunked response for us. Even at this level, some of the details are being hidden.

Status Seeking

Let’s take this a step further. The status script gets a status message (“GREEN”, “YELLOW”, or “RED”) from a file, and prints it out like so:

1
2
3
4
5
6
$ curl -si http://localhost/~colin/api/v1/status/
HTTP/1.1 200 OK
Content-Length: 4
Content-Type: text/plain

RED

It also lets us set a new status like so:

1
2
3
4
$ curl -si -X PUT -d GREEN http://localhost/~colin/api/v1/status/
HTTP/1.1 204 No Content
Content-Length: 0
Content-Type: text/plain

Note that we used the same URL, but changed the HTTP method to PUT (instead of the default GET - don’t ask me why that’s the -X option) and specified “GREEN” as the data (-d) to be sent along with the request. We get back an exciting new response code: 204! Since we’re telling not asking, it doesn’t make much sense for the server to send anything back. The 204 status just says, “That thing you were doing? It worked.” No reason to have a message body saying “Success!” when the code already tells you that. I’ve definitely been guilty of reinventing that wheel before I ran across this.

What if we try to send a bad status, like ‘BLUE’?

1
2
3
4
5
$ curl -si -X PUT -d BLUE http://localhost/~colin/api/v1/status/
HTTP/1.1 400 Bad Request
Content-Type: text/plain

Invalid status code

400 is the “your mistake” error code, which is pretty generic, so we include a descriptive message in the response body. Since it’s a user error, it’s reasonable to just have a human-readable message.

If you look at the script, you’ll see references to environment variables like $REQUEST_METHOD. That’s how Apache makes information about the request avaliable to the script (as part of the CGI standard). In case you want to see all of them, I’ve added an env script, which dumps them all out, plus the content. You can hit it with Chrome or curl, or even netcat. See what’s different between them.

API Documentation

Ok, great! Now we have two simple yet useful web services. But they’re not so simple that they don’t need any documentation, so let’s add some. We could have some sort of parallel hierarchy for documentation, like /api/docs/v1/load/, etc., but that’s kinda clunky. Instead, let’s rework our services so they give you data when you ask for data, and text when you ask for text. For that, we take advantage of the Accept header. Take a look at the script to see all the details, but it’s basically a bunch of if-then-else clauses checking $HTTP_ACCEPT.

Now when we point curl at the new version of the ‘load’ service, we normally get:

1
2
3
4
5
6
7
8
$ curl -si http://localhost/~colin/api/v2/load/
HTTP/1.1 200 OK
Content-Type: text/plain

LOAD
The 'load' resource contains the unix system load information for this server.
GET is the only valid method.
Data is returned as application/json.

But if we add the Accept header saying we want JSON data, we get:

1
2
3
4
5
6
7
8
$ curl -si -H 'Accept: application/json' http://localhost/~colin/api/v2/load/
HTTP/1.1 200 OK
Date: Wed, 27 Aug 2014 03:48:48 GMT
Server: Apache/2.4.7 (Ubuntu)
Transfer-Encoding: chunked
Content-Type: application/json

{"load": {"1": 0.09, "5": 0.13, "15": 0.20}}

Sweet! And when we use Chrome, which asks for text/html, we get:

Auf Deutsch!

If you hit the env script with Chrome, one of the environment variables you see is HTTP_ACCEPT_LANGUAGE, which corresponds to the Accept-language header. So by setting a header field, we can tell the server what language we want the response in. Again, something that I probably would have re-implemented in the message body in a totally ad-hoc way, instead of using the standard language codes. For my browser, the language code is en-US - American English. Just for kicks (since Google Translate makes it easy and my girlfriend knows enough German to sanity-check it), let’s translate the load script documentation into German, and display that if the Accept-language header begins with ‘de’. Take a look at that version of the load script to see the details.

When I hit v3/load/ in the browser, it looks the same. But then I (temporarily) switch my language preference to German (Google to see how to do that) and reload the page, and I get:

If I go back and hit the env script, I see HTTP_ACCEPT_LANGUAGE is now de,en-US;q=0.8,en;q=0.6. (q is strength of preference, so de at 100%, en-US at 80%, plain en at 60%. I cheat and just look at the first two letters.)

Caching

This is one that I recently avoided re-implementing. We were developing an iPhone app that used a data file that wouldn’t change often, but did need to be kept up to date. It was a couple megabytes - more than we’d want to have to fetch every time the app starts up. We were talking about all sorts of ways of doing that before I thought, “Hey, isn’t this essentially a content caching problem? I bet there’s some sort of mechanism built into HTTP for that.” It turns out that’s an understatement - there’s all kinds of caching schemes built into HTTP.

The simplest and most generally useful involves the ETag header. This is a unique identifier generated by the server and included in the response headers. Apache does it automatically for static files. For example, if you fetch this file (the one you’re reading) with curl, you’ll see a header block like this:

1
2
3
4
5
6
7
8
$ curl -si http://localhost/~colin/api/narrative.md
HTTP/1.1 200 OK
Date: Thu, 28 Aug 2014 11:23:58 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Thu, 28 Aug 2014 11:23:43 GMT
ETag: "3796-501aec4f3f942"
Accept-Ranges: bytes
Content-Length: 14230

Then the next time you request that page, include an If-None-Match header with that ETag value. That tells the server to only return the contents if the ETag value it calculates doesn’t match. (Note that the quotes are important!) If it’s still the same, it returns a 304 status code, like so:

1
2
3
4
5
$ curl -si -H 'If-None-Match: "3796-501aec4f3f942"' http://localhost/~colin/api/narrative.md
HTTP/1.1 304 Not Modified
Date: Thu, 28 Aug 2014 11:25:57 GMT
Server: Apache/2.4.7 (Ubuntu)
ETag: "3796-501aec4f3f942"

Apache doesn’t set an ETag header for script responses (because they’re expected to be different every time), so we’ll have to implement this ourselves. Fortunately, that’s pretty easy. I just run the status contents through the sha1sum utility, which calculates a unique number based on them. (How it does that is a whole ‘nother article.) Amazingly, this turns out to be slightly faster than getting the modification time for the file. That section of the code looks like:

1
2
3
4
5
6
7
8
9
10
11
    etag=`echo $status | sha1sum | cut -d ' ' -f 1`
    if [ "$HTTP_IF_NONE_MATCH" == "$etag" ] ; then
        echo "Content-type: text/plain"
        echo "Status: 304"
        echo
    else
        echo "Content-type: application/json"
        echo "ETag: $etag"
        echo
        echo '{"status": "'$status'"}'
    fi

Pretty simple, huh?

Onward!

If you want to learn more about this, Wikipedia is a good place to start, as always. They’ve pulled together info on all the status codes and header fields, as well as an in-depth overview of HTTP. Yahoo! also has a nicely concise summary of what the status codes mean and when to use them.

So if you’re new to web development and web services, hopefully they’re a little less intimidating now. If you’re an old hand, maybe you’ve picked up some new tools and techniques for debugging. If you haven’t already, play around with netcat, be a Human Web Server, get it into your fingers. Make curl part of your toolbox. For both, take some time to explore what else they can do. Next time you’re designing web applications or services, consider whether you’re reinventing the wheel or making things more complicated than they need to be.

Have the courage to build simple things.

Tail Recursion

One of the cool things in Erlang is efficient tail recursion. It means that when a function calls itself as its last instruction, Erlang re-uses that stack frame. That probably doesn’t clarify it much, so let’s look at an example.

This Java code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public class Recursor {
    public static void main(String[] args) {
        System.out.println("Total: " + countdown(10));
    }

    private static int countdown(int x) {
        return countdown(x, 0);
    }

    private static int countdown(int x, int total) {
        if (0 == x) {
            return 1/0;  // GENERATE EXCEPTION
        }
        else {
            // recurse with decreased countdown and increased total
            return countdown(x - 1, total + 1);
        }
    }
}

generates this stacktrace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ java Recursor
Exception in thread "main" java.lang.ArithmeticException: / by zero
 at Recursor.countdown(Recursor.java:12)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:15)
 at Recursor.countdown(Recursor.java:7)
 at Recursor.main(Recursor.java:3)

If the countdown had started at 100, you’d have a hundred lines in the stacktrace.

The equivalent code in Erlang

1
2
3
4
5
6
7
8
9
10
11
12
-module(recurse).

-export([countdown/0]).

countdown() ->
    countdown(10, 0).

countdown(0, Total) ->
    Total/0;  %% generate error
countdown(Val, Total) ->
    %% recurse with decreased countdown and increased total
    countdown(Val - 1, Total + 1).

generates this stacktrace

1
2
3
4
5
6
7
8
> catch recurse:countdown().     
{'EXIT',{badarith,[{recurse,countdown,2,
                            [{file,"recurse.erl"},{line,9}]},
                   {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,573}]},
                   {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,357}]},
                   {shell,exprs,7,[{file,"shell.erl"},{line,674}]},
                   {shell,eval_exprs,7,[{file,"shell.erl"},{line,629}]},
                   {shell,eval_loop,3,[{file,"shell.erl"},{line,614}]}]}}

Even though the function has called itself recursively ten times, it’s only one line of the stacktrace. (I’ve tested it up to 100 million.)

In fact, this is true of any tail call, even to different functions. This code generates the same stacktrace

1
2
3
4
5
6
7
8
9
10
11
12
13
countdown() ->
    countdown(100, 0).

countdown(0, Total) ->
    Total/0;
countdown(Val, Total) ->
    bounce_one(Val - 1, Total + 1).

bounce_one(Val, Total) ->
    bounce_two(Val, Total + 1).

bounce_two(Val, Total) ->
    countdown(Val, Total + 1).

Surprisingly, even non-tail recursion in Erlang is done fairly efficiently. This version of the code, where the last instruction is addition

1
2
3
4
5
6
7
countdown() ->
    countdown(10).

countdown(0) ->
    1/0;
countdown(Val) ->
    1 + countdown(Val - 1).

generates this stacktrace

1
2
3
4
5
6
7
8
9
10
> catch recurse:countdown().
{'EXIT',{badarith,[{recurse,countdown,1,
                            [{file,"recurse.erl"},{line,9}]},
                   {recurse,countdown,1,[{file,"recurse.erl"},{line,11}]},
                   {recurse,countdown,1,[{file,"recurse.erl"},{line,11}]},
                   {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,573}]},
                   {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,357}]},
                   {shell,exprs,7,[{file,"shell.erl"},{line,674}]},
                   {shell,eval_exprs,7,[{file,"shell.erl"},{line,629}]},
                   {shell,eval_loop,3,[{file,"shell.erl"},{line,614}]}]}}

Even there, there are only three stack frames for that recursive function. (It’s the same even with a recursion depth of 100.) I’ll have to find a real Erlang expert to explain that.

The reason this is important is that it’s the basis for the Actor model in Erlang. Actors are long-running processes that fundamentally look something like

1
2
3
4
5
loop(State) ->
    receive Message ->
        NewState = handle_message(Message, State),
        loop(NewState)
    end.

They wait for incoming messages, do something with them, and recurse with their updated state. Yes, you could do that in Java with a while loop, but doing it with a recursive function has two benefits. The first is just cleanliness: When you call a function, it just gets the values that are passed to it. In a while loop, you have to worry about the state of any variables that are visible from inside it.

The big win is that this is what lets Erlang do hot code loading. All of the process state gets passed as a parameter, so its data is separate from its code. When loop calls itself, it can invoke the new version of its code. Because Java classes combine code and data, you can’t update the processing logic without wiping out the application state.

Down the Rabbit Hole

So here’s what I’ve spent the last couple days working on. Yes, it’s assembly code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
; assemble with nasm:
;     nasm -f elf -g welcome.asm && ld -o welcome welcome.o

%macro print 2
    mov eax, 4  ; sys_write
    mov ebx, 1  ; stdout
    mov ecx, %1  ; address of message
    mov edx, %2  ; length of message
    int 0x80
%endmacro

section .text
    global _start

section .data
prompt db "Hi, what's your name? "
prompt_length equ $ - prompt
welcome_part_1 db "Hello, "
welcome_part_1_length equ $ - welcome_part_1
welcome_part_2 db "! Welcome to assembly.",0x0a
welcome_part_2_length equ $ - welcome_part_2

section .bss
name resb 40
name_max_length equ $ - name
name_length resb 4
extra resb 40
extra_max_length equ $ - name

section .text
_start:
    print prompt, prompt_length

    ; read name
    mov eax, 3  ; sys_read
    mov ebx, 0  ; stdin
    mov ecx, name
    mov edx, name_max_length
    int 0x80

    ; eax is bytes read
    ; if 0 (ctrl-d/EOF), exit
    cmp eax, 0
    jz exit
    ; if max, there may be more input
    cmp eax, name_max_length
    jnz read_complete
    cmp byte [name + eax - 1], 0x0a
    jz read_complete

    ; clear out the rest of the input, or it will be read by the shell as the next command!
clear_input:
    push eax  ; save the name length
    ; read extra
    mov eax, 3  ; sys_read
    mov ebx, 0  ; stdin
    mov ecx, extra
    mov edx, extra_max_length
    int 0x80
    ; if max, there may be more input
    cmp eax, extra_max_length
    jnz input_cleared
    cmp byte [extra + eax - 1], 0x0a
    jz input_cleared
    jmp clear_input
input_cleared:
    pop eax

read_complete:
    ; if last is \n, change it to \0 and decrement length
    cmp byte [name + eax - 1], 0x0a
    jne length_ok
    dec eax
    mov byte [name + eax], 0x00

length_ok:
    cmp eax, 0  ; only if input was \n
    jz _start
    or eax, 0x30  ; convert to ascii
    mov [name_length], eax

    print welcome_part_1, welcome_part_1_length
    print name, name_max_length
    print welcome_part_2, welcome_part_2_length

exit:
    mov eax, 1  ; sys_exit
    mov ebx, 0  ; exit code
    int 0x80

In case you don’t know assembly, and since what this does probably isn’t obvious even if you do, here’s an equivalent shell script.

1
2
3
4
while test -z "$name" ; do
    read -p "Hi! What's your name? " name
done
echo "Hello, $name! Welcome to bash."

And of course, the assembly program will only work on an Intel chip running Linux, and the bash script will work on anything that runs bash.

So why on earth, you might rightly ask, am I doing this?

Partly, it’s just curiosity. Or perhaps something stronger and more negative, an anxiety aroused by awareness of ignorance. It makes me deeply uncomfortable to have to say, “Oh, that’s a black box; I have no idea how it works.” For the work I’m doing now, it’s no more than that, but if I want to move into security work, it’s much more important because a lot of exploits operate at this level.

I’ve compared learning new programming languages to foreign travel before, and this has been a similar experience. It’s really weird and jarring at first, but I acclimated more quickly than I expected. I’m still very much an assembly newbie, but I’ve crossed some basic threshold of wrapping my head around it.

Assembly is different. You’re dealing with the hardware, shuffling data between registers and memory. You have to pay attention to each byte. And you’re keenly aware when you’re handing off control to the operating system (which is another black box that I’m digging into, on which more at a later date). It stops you from taking a lot for granted. It’s also strangely appealing to be working with such simple tools, at such a fundamental level.

As you can see, it’s a lot of work to do something very basic, but this is what any program ultimately boils down to. The bash script is, under the hood, generating a set of instructions like this (but undoubtedly more complicated). All the python code I write day-in, day-out, blithely schlepping objects around between databases and the web, generates an unimaginable spew of assembly instructions. Working at this level gives you an appreciation for what an amazing structure of code we’ve built on top of this.

World’s Dumbest TCP Service

I feel like a bash scripting Dr. Frankenstein. Last week, I wrote about the world’s dumbest instant messaging tool. This week I’ve moved on to the world’s dumbest TCP server. In my quest for a deep understanding of networking, it’s a bit of a detour, a roadside America attraction. I’m not so much learning new stuff here as figuring out how to explain what I know. And having fun pushing the limits of bash in the process.

Part of this is just the challenge of the absurd: Could I make this particular bear dance? But the serious point is two-fold: to make the idea of network services more accessible, and to show that bash is more powerful than you might think. People don’t think of bash as a programming language: A bash script usually starts life as a bunch of commands you executed one at a time on the command line, then copied into a file so you didn’t have to re-type them. That’s not real programming, is it? But bash also has variables, data structures, functions, and even process forking. That’s enough to get a fair amount done.

Network servers have the opposite problem. They’re big bits of infrastructure that Other People write. And infrastructure-grade servers - like the Apache web server - are complicated. Apache has to implement the full HTTP protocol, not just the small subset of it that most people use. It’s got all this logic for authenticating users, negotiating content types, redirecting to pages that have moved, and so on. On top of all that, it’s got developer-decades worth of edge case handling, performance optimizations, and feature creep.

But the core of what it does is straightforward: It listens on a socket, you connect to it and send it a message in a particular format, it does some processing on that, and it sends you a message in response. That’s fundamentally what all network servers do. The ones we’re familiar with - web, mail, and chat servers - have rich and complex message protocols, but a quick skim through the /etc/services file turns up sedimentary layers of oddly specific services, like ntp and biff. They do small, specific, useful things.

So the what’s the simplest server I can write that does something even minimally useful? And can I write it in bash?

I figured netcat is a good place to start. Last week, we used it to send messages back and forth between two people. All we want to do now is replace one of those people with a very small shell script. netcat -l port starts up a server that listens on a port and dumps anything it gets to standard output (stdout). It also sends anything it gets on standard input (stdin) back to the client. We just need to redirect netcat’s stdout to a program, and then redirect that program’s output to netcat’s stdin. Doing either of those alone would be trivial; doing both is tricky.

Figuring that out took a fair amount of digging through the bash man page, and experimenting to get the syntax right, but in the end it was a trivial amount of code. Let’s take it as read for now that we’ve got a program called wtf_server, which reads from stdin and writes to stdout. What we’re going to do is use bash’s built-in coproc command, which will start it up as a background process, and set up new file handles for its stdin and stdout.

1
coproc WTF { wtf_server; }

The WTF tells coproc to create an array named WTF and save the file handles in it. ${WTF[0]} will be wtf_server’s stdout, ${WTF[1]} will be its stdin. So now we can start up the netcat server with its stdout and stdin jumper-cabled to wtf_server as desired.

1
2
port=2345
nc -l $port <&${WTF[0]} >&${WTF[1]}

Really, that’s the hard bit. Now we just need a program to read stdin and write stdout. In fact, we don’t even need a real program; our wtf_server is actually just a bash function. In its simplest incarnation, it just echoes back what was sent to it:

1
2
3
4
5
6
function wtf_server () {
    while true ; do
        read msg
        echo "You said: '$msg'"
    done
}

With the coproc and netcat server running, you can switch to another terminal, open a client connection with netcat, and have an exchange like this:

1
2
3
$ nc localhost 2345
hello world!
You said: 'hello world!'

Ok, so that’s the proof of concept. We’re definitely falling short of the “minimally useful” criteria, but we can replace our echo with any bash commands we want. The only constraint is what it’s safe to do - this is still a toy service, anonymous and going over an unencrypted connection. Don’t run the input as shell commands, fer chrissakes. Within those limits, there’s still plenty of useful things we can do: reporting on system info or serving up static content. Here’s a sketch with a few ideas:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function wtf_server () {
    while true ; do
        read msg
        case $msg in
            i | index )
                ls $docs ;;
            get\ * )
                f=${msg#get }
                cat $docs/$f ;;
            t | time )
                date ;;
            u | uptime )
                uptime ;;
            * )
                echo "Commands: t, time; u, uptime; i, index; get <file>"
                echo "    ctrl-c to exit"
        esac
        echo -n "> "
    done
}

This gives us a limited interactive shell. Each case statement handles a different request format. We can get the machine’s current time and uptime stats. It also has a docs directory; we can list the files in it and cat them out individually. A session looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ nc localhost 2345

Commands: t, time; u, uptime; i, index; get <file>
    ctrl-c to exit
> t
Sat May  4 11:52:52 EDT 2013
> u
 11:52:53 up 42 days,  5:48, 26 users,  load average: 0.67, 0.37, 0.35
> i
about.txt
status.txt
> get status.txt
Up late on a Friday, hacking bash scripts.
> ^C
$

(After connecting, I just hit return to send a blank line, and the server responded with the help text and the ‘>’ prompt. Every server response ends with a prompt.)

That’s it. No real protocol, certainly nothing formal like HTTP, just a set of ad-hoc request handlers, made up as we went along. The beauty of this is that it doesn’t depend on anything else. It’s not running behind Apache or anything. There’s no development environment to set up, no gems to install; just one standard unix utility - netcat - and bash handles all the rest.


Here’s the full wtf_server.sh script that starts this up.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash

# Weird little TCP server
# Tells time and uptime; can list and dump files in an "docs" subdir

# Takes a port parameter, just so you know which one you're running on.
test -n "$1" || { echo "$0 <port>"; exit 1; }
port=$1
dir=`dirname $0`
docs=$dir/docs

function wtf_server () {
    while true ; do
        read msg
        case $msg in
            i | index )
                ls $docs ;;
            get\ * )
                f=${msg#get }
                cat $docs/$f ;;
            t | time )
                date ;;
            u | uptime )
                uptime ;;
            * )
                echo "Commands: t, time; u, uptime; i, index; get <file>"
                echo "    ctrl-c to exit"
        esac
        echo -n "> "
    done
}

# Start wtf_server as a background coprocess named WTF
# Its stdin filehandle is ${WTF[1]}, and its stdout is ${WTF[0]}
coproc WTF { wtf_server; }

# Start a netcat server, with its stdin redirected from WTF's stdout,
# and its stdout redirected to WTF's stdin
nc -l $port -k <&${WTF[0]} >&${WTF[1]}

Unpacking Packets

So, I sort of understand this whole TCP thing: You open a connection, you send packets, you close the connection. TCP provides a reliable delivery protocol layered on top of the unreliable IP protocol. So your data gets wrapped in a TCP segment, which gets wrapped in an IP datagram.

But what does that actually look like?

Web requests, email, and all of that add another layer of protocol overhead on top of TCP, so let’s start out with something really simple: the world’s dumbest instant messaging service. We’re going to use netcat, the Swiss army knife of TCP/IP utilities. All we’re going to do is have one netcat process (the server) listen on a TCP port, and have another netcat process (the client) open a connection to it. Both will send any messages typed on the command line, and print any messages they get. We start up the server like so:

1
$ nc -l 43981

That’s just telling netcat to start up and listen on port 43981. Why 43981? We’ll get to that in a bit.

Then we switch to another terminal, and start up the client like so:

1
$ nc localhost 43981

Here, we need to tell it which server to connect to, and give it the same port number. Then we type stuff into the client:

1
2
3
$ nc localhost 43981
hello world!
how's it going?

Each time we hit return, the line shows up in the server:

1
2
3
$ nc -l 43981
hello world!
how's it going?

A key thing about TCP is that it’s a two-way connection. Part of what the client does when it opens the connection is tell the server how to send messages back to it. So here we can also type something into the server:

1
2
3
4
$ nc -l 43981
hello world!
how's it going?
pretty good!

And it will show up in the client:

1
2
3
4
$ nc localhost 43981
hello world!
how's it going?
pretty good!

When we get bored, we ctrl-c to quit either the server or the client, and the other shuts down automatically.

Pop the Hood

Ok, so that’s it. Messages going across a TCP connection a line at a time. Totally bare-bones. So what’s going on under the hood? To answer that, we’re going to re-run this little exercise, and this time we’re going to use tcpdump to listen in on the conversation. As the name implies, tcpdump listens in on TCP traffic and dumps it out to the screen. So, open a third terminal and fire up tcpdump:

1
2
3
$ sudo tcpdump -i lo -X port 43981
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes

“-i lo” tells it to listen on the loopback interface, since our machine is just sending messages to itself, and “-X” will dump out the TCP segments in a couple of useful formats. “port 43981” tells it to only report traffic to and from our netcat server port.

We don’t see anything when we start up our netcat server, but as soon as we start up the client, we get this in the tcpdump terminal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
12:50:53.227362 IP localhost.59356 > localhost.43981: Flags [S], seq 586457076, win 32792, options [mss 16396,sackOK,TS val 48581958 ecr 0,nop,wscale 7], length 0
  0x0000:  4500 003c d2b2 4000 4006 6a07 7f00 0001  E..<..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 9ff4 0000 0000  ........".......
  0x0020:  a002 8018 fe30 0000 0204 400c 0402 080a  .....0....@.....
  0x0030:  02e5 4d46 0000 0000 0103 0307            ..MF........
12:50:53.227404 IP localhost.43981 > localhost.59356: Flags [S.], seq 2685804629, ack 586457077, win 32768, options [mss 16396,sackOK,TS val 48581958 ecr 48581958,nop,wscale 7], length 0
  0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
  0x0010:  7f00 0001 abcd e7dc a016 2055 22f4 9ff5  ...........U"...
  0x0020:  a012 8000 fe30 0000 0204 400c 0402 080a  .....0....@.....
  0x0030:  02e5 4d46 02e5 4d46 0103 0307            ..MF..MF....
12:50:53.227439 IP localhost.59356 > localhost.43981: Flags [.], ack 1, win 257, options [nop,nop,TS val 48581958 ecr 48581958], length 0
  0x0000:  4500 0034 d2b3 4000 4006 6a0e 7f00 0001  E..4..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 9ff5 a016 2056  ........"......V
  0x0020:  8010 0101 fe28 0000 0101 080a 02e5 4d46  .....(........MF
  0x0030:  02e5 4d46                                ..MF

What we see here is the client and server negotiating a TCP connection in what’s known as a three-way handshake. Our client sends a packet saying that it wants to start a connection, the server sends back an acknowledgement, and the client responds with a confirmation. For each of these we get a summary line describing the packet and then a dump of the actual contents - verbatim, byte-by-byte. The “0x0000” and such on the left are the byte index in hexadecimal for the start of each row; so zero, 10 (16 in decimal), 20 (32), 30 (48). The big chunk in the center is the data in hex characters. Each hex character is 4 bits (half a byte, and thus referred to as a “nibble” - ah, nerd humor), so each set of 4 is two bytes. The block on the right is the same data, rendered as ASCII characters (with all the non-printing characters shown as periods). Since what we’re dealing with here is all binary data, that’s not useful yet.

So what is all this crap?

IP Header

Well, like I said, we’ve got TCP segments wrapped in IP datagrams, so the IP data is going to be what we see first. As always, Wikipedia is an awesome resource, and its page on IPv4 lays out the datagram structure for us bit-by-bit. (You may want to open that up in another tab for reference while you’re reading this.) Let’s look at our dump of the first packet:

1
2
3
4
 0x0000:  4500 003c d2b2 4000 4006 6a07 7f00 0001  E..<..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 9ff4 0000 0000  ........".......
  0x0020:  a002 8018 fe30 0000 0204 400c 0402 080a  .....0....@.....
  0x0030:  02e5 4d46 0000 0000 0103 0307            ..MF........

The first thing we see is the “4” telling us that this is an IPv4 datagram (not IPv6). Then a “5” for the header length. That’s in 32-bit words, so each of those will be two blocks of hex characters. So we already know that the IP header part of this packet is just:

1
2
 0x0000:  4500 003c d2b2 4000 4006 6a07 7f00 0001  E..<..@.@.j.....
  0x0010:  7f00 0001                                ....

The next byte is the DSCP and ECN fields. They’re all zeros, so we can ignore them here, but essentially they tell routers how important or urgent this packet is. In principle, all packets are the same, but in practice we might want some packets - like for Voice Over IP - to have a higher priority if there’s a lot of traffic. This one byte opens a rabbit hole of technical and policy issues.

The next two bytes - 003c or 60 in decimal - tell us the total length of the packet. Sure enough, the packet ends after 12 bytes of the “0x0030” row. Two bytes here means that the total length of the packet can’t be more than 65535 bytes (216 - 1).

IP doesn’t guarantee that packets will arrive in the order they were sent, so each packet needs an Identification field so the server can put the data back together the right way. The starting index is arbitrary - d2b2 for this one - but you’ll see that it’s incremented normally for later packets. Why not just start with 0001 or 0000? I suspect there’s another rabbit hole there.

Each packet incurs a certain amount of overhead in transmission and processing, so it makes sense to put as much data in each packet as possible. But while the IPv4 protocol sets a maximum of 65535 bytes, it doesn’t require that every router support that. Remember that the protocol was developed back when 64K bytes was more than most machines had, and even now, that’s a lot for one message on a router that’s handling large volumes of traffic.

So our next two bytes - 4000 - deal with packet fragmentation. When a router has to forward a packet that’s bigger than the next router can deal with, it will break it into fragments. The two bytes are divided up a little oddly: 3 bits for flags and 13 bits for the Fragment Offset - its index number. That means that the flags are the 8, 4, and 2 bits of the first nibble. It’s 4, so that’s the Don’t Fragment bit. Even if we were fragmented, this is the first packet, so the Fragment Offset is zero.

Next is TTL - Time To Live. When I bring up this page in a browser on my laptop, it sends packets skipping across the network to my hosting provider in California. They’ll pass through a few routers at my ISP and several more at internet backbone providers across the country before they get to the hosting server. There isn’t a pre-ordained route that they’ll follow. Each router looks at each packet and tries to figure out where to send it to get it closer to its destination. This is what makes the internet robust: If one of those connections goes down, the router will figure out the next best way to get the message through. (And yes, the mechanics of how that works are more than other whole essay.)

The downside of this is that if one or more routers are mis-configured, they could send the packets back to a previous router, and they’d end up going in loops. To keep packets from circling endlessly, they have a limited lifespan, measured in “hops”. Each router along the way decrements the TTL field. If the packet hasn’t got where it’s going by the time it gets to zero, the router knows something’s gone wrong, and drops it. Our packet starts off with a TTL byte of 40, so it’s got 64 hops to live.

It may not look like it, but we’re almost down to the end of the IP header here. The next byte tells us the IP Protocol number for the contents of this IP datagram. 06 means it’s TCP.

The next two bytes - 6a07 - are the header checksum. It’s a number calculated from all the bytes in the header. It’s a way to check that the header wasn’t garbled in transit. When a router gets a packet, it calculates a checksum based on the header it received; if any bits got randomly flipped, the checksums won’t match. (This doesn’t protect against intentional tampering because someone could also update the checksum.)

The last two fields are the source and destination IP addresses. Again, this is a two-way connection we’re setting up here, so the client needs to tell the server where to send packets back to. Since we’re just talking to ourselves over the loopback interface, they’re both 127.0.0.1 - 7f00 0001 in hex.

TCP Header

Ok, that’s the IP header. Now on to the TCP header. Let’s strip the IP header out of our packet and see what’s left.

1
2
3
4
 0x0000:
  0x0010:            e7dc abcd 22f4 9ff4 0000 0000      ....".......
  0x0020:  a002 8018 fe30 0000 0204 400c 0402 080a  .....0....@.....
  0x0030:  02e5 4d46 0000 0000 0103 0307            ..MF........

The first two fields - two bytes each - are the source and destination port numbers: e7dc and abcd. That’s why I picked the weird port to run this on: 43981 in hex is abcd, so it’s easy to spot in the output. e7dc is 59356, which isn’t significant - it’s just what was automatically chosen when the client opened the connection. Perhaps the most significant thing about the ports is that they’re not part of the IP header. Ports are a TCP-level concept; the IP layer only cares about getting the packets to the right machine.

The next four bytes - 22f4 9ff4 - are the Sequence Number (586,457,076). As with the Fragment Offset in the IP layer, this is to keep track of what order the segments belong in and which have been received. The big difference is that here it’s the index of the starting byte in the segment, so it will increase from segment to segment by the number of bytes in the TCP data. It also starts at an arbitrary value, and loops around to zero when it hits the maximum Sequence Number (4 Gigabytes). More on this later.

The next number is the Acknowledgement Number. It’s essentially the Sequence Number for the data received. It’s zero for now, so we’ll talk about it later when it’s got something to say for itself.

The next nibble is the Data Offset, which is the TCP header length in 32-bit words. It’s “a” (10), for a total of 40 bytes, which matches what we can see.

The rest of the a002 block are unused bits (reserved for future use) and flags. They’re all zero except the 2 bit, which is the SYN flag (for synchronize), which means that this segment is the start of a connection.

The next two bytes - 8018 (32792 in decimal) - are the Window Size. This is the sender putting a cap on how much data can be sent back to it, in case it has limited resources. I don’t know the reason for that exact number, but there’s a surprise here: We’ll see in a minute that there’s an optional field that multiplies this value.

Next is the TCP checksum. Unlike the IP checksum, this one is summed across both the TCP header and data. Why doesn’t the IP checksum just do both? I’m not sure, but I’d guess there’s both a design principle and a practical reason. TCP shouldn’t really depend on IP for that. Even though they were designed to work together, they have separate responsibilities. In theory, you could run TCP on top of other protocols than IP, though I don’t know of anyone doing that. So if TCP has to calculate its own checksum, there’s no point making IP do it as well. The practical concern is that the TCP data can be huge compared to the 40 bytes of IP header, and the IP checksum has to be checked at every hop; the TCP checksum is only checked when it reaches its destination.

The last standard field is the Urgent Pointer. The URG flag wasn’t set, so this is 0000. As to when that flag is set and how the urgent pointer is used when it is, that’s probably yet another rabbit hole.

Beyond that, we have a number of optional fields. They’re odd in that they’re not in a specific order, they’re different sizes, and they may have multiple sub-fields. The first byte of each tells us what type of field it is. I’ll run through them quickly, putting pipes between the sub-fields so you can see how they’re broken up. * 02|04|400c: Maximum segment size = 400c (16,396 bytes). This is used by the TCP layer to limit the segment size and save it from getting fragmented at the IP layer. * 04|02: Selective acknowledgement permitted. Allows the receiver to request re-transmission of only missing segments, rather than the whole message. More on this later. * 08|0a|01fd b1be|0000 0000: Timestamp=01fd b1be (33403326), previous timestamp=0000 0000. Used to help determine the order of the TCP segments when the amount of data being sent is more than the maximum Sequence Number. * 01: no operation - padding to align options on word boundaries for performance. * 03|03|07: window scale = 7; multiplies Window Size by 27, bringing it to 4,197,376 bytes

We can check our homework here by looking at the packet summary. (Come on, it wouldn’t have been any fun if we did that first!)

1
12:50:53.227362 IP localhost.59356 > localhost.43981: Flags [S], seq 586457076, win 32792, options [mss 16396,sackOK,TS val 48581958 ecr 0,nop,wscale 7], length 0

Now that we know what we’re looking at, it’s pretty easy to read: time; host.port for source and destination; SYN flag; Sequence Number; window size; options with max segment, selective ack, timestamp, no-op, and window scale; and data length of zero.

Awesome, done! That’s the first packet.

The Rest of the Handshake

Now that we have the structure down, we just need to look at what’s different about the rest of the packets, and we can get most of what we need from the summary lines.

So, packet two is the response from our server.

1
2
3
4
5
12:50:53.227404 IP localhost.43981 > localhost.59356: Flags [S.], seq 2685804629, ack 586457077, win 32768, options [mss 16396,sackOK,TS val 48581958 ecr 48581958,nop,wscale 7], length 0
  0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
  0x0010:  7f00 0001 abcd e7dc a016 2055 22f4 9ff5  ...........U"...
  0x0020:  a012 8000 fe30 0000 0204 400c 0402 080a  .....0....@.....
  0x0030:  02e5 4d46 02e5 4d46 0103 0307            ..MF..MF....

What’s different? The IP identification is 0000, which strikes me as odd. Don’t know what’s going on there. And the checksum is different because of that. If we were connecting two different machines, we’d have seen the source and destination addresses switch. In the TCP header, the ports switched, which is the tip-off that this packet is going from the server to the client. We have a new Sequence Number, since the client and server keep separate counts of the data bytes they send. We now have an Acknowledgement Number, which is the client’s Sequence Number from the last packet, plus one. Both the SYN and ACK flags are set, marking this as the server acknowledgement. There’s a slightly different Window Size, but with the same scaling factor. The previous timestamp is set. And of course a different TCP checksum because of all that.

The third packet is the client’s confirmation of the connection. The server knows that the client asked for a connection, and the server knows that it sent an acknowledgement, but it needs to know that the client got the acknowledgment.

1
2
3
4
5
12:50:53.227439 IP localhost.59356 > localhost.43981: Flags [.], ack 1, win 257, options [nop,nop,TS val 48581958 ecr 48581958], length 0
  0x0000:  4500 0034 d2b3 4000 4006 6a0e 7f00 0001  E..4..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 9ff5 a016 2056  ........"......V
  0x0020:  8010 0101 fe28 0000 0101 080a 02e5 4d46  .....(........MF
  0x0030:  02e5 4d46                                ..MF

The IP identity has been incremented, which changes the checksum. The TCP Sequence Number has been incremented and matches Acknowledgement Number from the previous server packet. Likewise, the Acknowledgement Number is now the Sequence Number from the server packet, incremented. We have fewer options - just the timestamps and a couple of no-ops - so the Header Size is only 8. Only the ACK flag is set, which says that the connection is solid now. The Window Size is 257, with no scaling factor in the options. I don’t think this is actually used now that the connection is established, so I don’t know why it’s not zero. Something else to research.

Anyway, hey, TCP connection established! So this is the first thing we’d see whether we’re sending email, hitting a web page, or whatever.

Getting Down to Work

After that, we send a message from the client to the server, and get a response back. This time, we’re actually sending data!

1
2
3
4
5
6
7
8
9
10
11
12:51:04.418321 IP localhost.59356 > localhost.43981: Flags [P.], seq 1:14, ack 1, win 257, options [nop,nop,TS val 48584755 ecr 48581958], length 13
  0x0000:  4500 0041 d2b4 4000 4006 6a00 7f00 0001  E..A..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 9ff5 a016 2056  ........"......V
  0x0020:  8018 0101 fe35 0000 0101 080a 02e5 5833  .....5........X3
  0x0030:  02e5 4d46 6865 6c6c 6f20 776f 726c 6421  ..MFhello.world!
  0x0040:  0a                                       .
12:51:04.418446 IP localhost.43981 > localhost.59356: Flags [.], ack 14, win 256, options [nop,nop,TS val 48584755 ecr 48584755], length 0
  0x0000:  4500 0034 6d10 4000 4006 cfb1 7f00 0001  E..4m.@.@.......
  0x0010:  7f00 0001 abcd e7dc a016 2056 22f4 a002  ...........V"...
  0x0020:  8010 0100 fe28 0000 0101 080a 02e5 5833  .....(........X3
  0x0030:  02e5 5833                                ..X3

Here’s where the ASCII output finally becomes useful. You can spot the “hello world!” content right away, which makes it a lot easier to keep track of which packets are which as we’re digging through this.

In the first packet, the Total Length is bigger by 13 (“hello world!” plus the return character). The client set the PSH flag, to indicate that there’s data to push to the application (netcat). The Acknowledgement and Sequence numbers are the same as last time because no data was sent; but then in the server’s response, its Acknowledgement Number is 13 (no coincidence) more than the client’s Sequence Number.

Ok, so now that the Acknowledgement number is starting to move for real, let’s talk about what all the futzing around with it and the Sequence Number is about. This is really the core of TCP, what makes it special. This is how it guarantees that the data gets through even when IP delivery fails and packets get dropped. To do that, the client needs to keep track of each chunk of data it sends out, and it needs to get a response from the server saying that piece has been received. It’s like registered mail but better, because the response tells the client not only that the server got a packet, but how much data it got and where it is in the client’s data set. (The Acknowledgement Number is actually the number of the next byte the server expects to get from the client).

TCP isn’t normally a one-for-one exchange like this. Often, the client would send out a whole mess of packets at once. Rather than acknowledging each individually, which would generate a whole lot of traffic, the server just sends back the Acknowledgement Number for the highest packet received, assuming it gets them all. If it doesn’t, if there are packets missing, it could send an acknowledgement for the highest contiguous packet it gets, and have the client re-send everything later. But it could be smarter than that, and this is where the Selective Acknowledgement option comes in. That lets the server acknowledge several discontinuous blocks (as start and end bytes), so the client only has to re-send the missing pieces.

Also remember that this is a two-way conversation. When the server is sending an acknowledgement to the client, it’s also sending its own Sequence Number, so the client can keep track of what it’s received from the server. In this case, all the content is in the outbound message, and what’s coming back is an empty acknowledgement. But in an HTTP request, we’d see content in the outbound message - request headers, the type of request (GET, POST, etc.), the path to the web page we’re requesting, and any form data - and the response would have the HTML content of the web page.

Shutting Down

The one thing left to show you is what happens when we close the connection. If you hit ctrl-c in either terminal, you’ll see both the client and server exit immediately, but you’ll also see a bunch of traffic in tcpdump.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
12:51:38.214610 IP localhost.59356 > localhost.43981: Flags [F.], seq 30, ack 14, win 257, options [nop,nop,TS val 48593204 ecr 48590359], length 0
  0x0000:  4500 0034 d2b7 4000 4006 6a0a 7f00 0001  E..4..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 a012 a016 2063  ........"......c
  0x0020:  8011 0101 fe28 0000 0101 080a 02e5 7934  .....(........y4
  0x0030:  02e5 6e17                                ..n.
12:51:38.215934 IP localhost.43981 > localhost.59356: Flags [F.], seq 14, ack 31, win 256, options [nop,nop,TS val 48593205 ecr 48593204], length 0
  0x0000:  4500 0034 6d13 4000 4006 cfae 7f00 0001  E..4m.@.@.......
  0x0010:  7f00 0001 abcd e7dc a016 2063 22f4 a013  ...........c"...
  0x0020:  8011 0100 fe28 0000 0101 080a 02e5 7935  .....(........y5
  0x0030:  02e5 7934                                ..y4
12:51:38.215994 IP localhost.59356 > localhost.43981: Flags [.], ack 15, win 257, options [nop,nop,TS val 48593205 ecr 48593205], length 0
  0x0000:  4500 0034 d2b8 4000 4006 6a09 7f00 0001  E..4..@.@.j.....
  0x0010:  7f00 0001 e7dc abcd 22f4 a013 a016 2064  ........"......d
  0x0020:  8010 0101 fe28 0000 0101 080a 02e5 7935  .....(........y5
  0x0030:  02e5 7935                                ..y5

We’ve sent a couple more messages back and forth (“how’s it going?”, “pretty good!”), so the Sequence and Acknowledgement numbers have jumped ahead a bit, as have the timestamps.

The real action here is in the TCP flags, the 2nd byte in the 0x0020 row. The ACK bit is still set, but now the FIN bit is too. That’s the client telling the server to close the connection. The server sends back a response with the FIN bit set, and the client sends a simple acknowledgement. It’s the same send-acknowledge-confirm exchange that we saw in the opening handshake.

Wrapping Up, Moving On

Ok, so that’s been a lot of to absorb, but what I hope you’ve gotten out of this is that all these internet protocol details are interestingly complex, but totally comprehensible. You’ve got the tools to look under the hood, and with a bit of patience you can figure out what all the parts are doing.

If you haven’t actually run through this little netcat/tcpdump exercise on your own terminal, give it a try. You don’t need to pick through it byte-by-byte like I have. Just take a couple minutes to watch the packets go back and forth, and skim the summary lines. That gave me a sort of visceral sense of what’s going on.

If you do want to dig into this more, try pointing tcpdump at a real service. Set up a minimal web page on your local web server, point tcpdump at port 80, and hit the page with your browser. I just tried that myself, and I think it’s going to keep me busy for a while.

A New Line of Inquiry

I’m starting on a new line of inquiry here.

I’ve been doing web applications and unix systems programming for about 15 years now. Computer and network security has always been an aspect of what I do, but it’s never been my focus. I’m thinking it’s time to change that. I’m looking for a challenge, something that will keep me busy and learning for the rest of my life. Security is an unending struggle on an ever-changing field. I want to be doing something useful, and security is becoming more and more of an issue in daily life.

I’ve always been a generalist, and security seems to be a field where that’s valuable. It’s not about using one tool to do a specific job; it’s about understanding systems at multiple levels, how things interact and how they fail. It’s about how people interact with technology. It’s creative: there’s a lot of design that goes into making software both secure and usable.

I have a lot to learn; like I said, this has never been my focus. I need to understand unix systems and networking protocols at a much deeper level than I have before. I’ve said for years that you don’t learn anything from a working system. It’s when something fails that you have to go in under the hood and learn how it actually works. A corollary to that is that you need to really understand a system to figure out how it can break, or how it can be broken intentionally.

“Under the hood” means that I need to dust off my C programming chops and set aside the layers of abstraction that I’m used to. There’s also a lot of lore and literature specific to computer security that I need to absorb. There are tools for both attack and defense that I need to play with.

The best way to learn something is to try to explain it, so that’s what I’m going to do here. Let me know if it makes sense, if it’s useful. Let me know where there are gaps or unanswered questions, or where I’m just plain wrong.

Amateur Erlang

This is based on the talk I gave at ErlangDC.

I don’t actually make my living programming Erlang, so I’m still a beginner in a lot of ways. I’ve been tinkering with it for the last year and a half or so, and in short, it’s been awesome. I’ve had a lot of fun; I’ve learned a ton, and what I’ve learned has been more broadly useful than I might have expected; and overall it’s definitely made me a better programmer.

So I’m going to talk about that experience: what you learn when you learn Erlang; some of the “ah-ha!” moments I’ve had - things that will give you a running start at the Erlang learning curve; and how to get some practical experience with Erlang before you dive into writing distributed, high-availability systems.

Foreign Travel

Learning a new programming language is like going to a foreign country. It’s not just the language, it’s the culture that goes with it. They do things differently over there. If you just drop in for a day trip, it’s going to be all weird and awkward; but if you stick around a bit, you start getting used to it; and when you go home, you find that there are things you miss and things that you’ve brought home with you.

There’s also a sort of meta-learning, because then when you go to a different country, it’s not as jarring; you adapt more quickly. I found that once I’d gotten used to Erlang’s syntax, other languages - Coffeescript and Scala - didn’t look so weird. At work the other day, someone was doing a demo of iPhone development, and some of my co-workers were really thrown by Objective-C’s syntax. I was just like, “Oh yeah, now that you mention it, it does have an odd mix of Lisp-style bracket grouping and C-style dot notation. Whatever. It’s code.”

Working with Erlang also teaches you a fundamentally different way of solving problems, especially if, like me, you’re coming from an object-oriented (OO) background like Java or Python. It has functional language features like recursion and closures. It focuses on simple data structures, and gives you powerful tools for working with them. And it’s all about concurrency. Those all add up to something more than the sum of their parts. They’re also things that translate to other languages: You’ll see Erlang-style concurrency in Scala, and functional programming is showing up all over the place these days.

Bowling

A good example of this is the bowling game program. I’ve written about this before, so let me just recap it quickly. It’s a standard programming challenge: Calculate the score for a game in bowling. It’s fairly straightforward, but there are a bunch of tricky edge cases. The first time I did it was in Python as a pair programming exercise, and at the end I was pretty happy with the results. It came out to 53 lines of code. Then about a year later, we did the same thing at one of the Erlang meetups, and the solution that one of the experienced Erlang programmers turned in was about ten lines of code. Ten lines of clean, elegant code, not like a Perl one-liner. That blew my mind.

I went back and looked at the Python code and realized how much of it was OO modeling that doesn’t actually help solve the problem. In fact, it creates a bunch of its own problems. Obviously, you need a Game class and a Frame class, and the Game keeps a list of Frames. Then you very quickly get into all these metaphysical questions around whether a Frame should just be a dumb data holder, or whether it should be a fully self-actualized and empowered being, capable of accessing other frames to calculate its score and detect bad data. Putting all the smarts in the Game may be the easiest thing, but that brings up historical echoes of failed Soviet central planning, and just doesn’t feel very OO. And once you’ve got these classes, you start speculating about possible features: What if you want to be able to query the Game for a list of all the rolls - does that change how you store that info? In short, you can get really wrapped around the axle with all these design issues.

The Erlang solution sidesteps that whole mess. It just maps input to output. The input is a list of numbers, the output is a single number. That sounds like some kind of fold function. With pattern matching, you write that as one function with four clauses: End of game, strike frame, spare frame, normal frame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
score(Rolls) -> frame(Rolls, 1, 0).

 %% Game complete.
frame(_BonusRolls, 11, Score) -> Score;

 %% Strike.
frame([10|Rest], Frame, Score) ->
    frame(Rest, Frame + 1, Score + 10 + strike_bonus(Rest));

 %% Spare.
frame([First,Second|Rest], Frame, Score) when (First + Second == 10) ->
    frame(Rest, Frame + 1, Score + 10 + spare_bonus(Rest));

 %% Normal.
frame([First,Second|Rest], Frame, Score) ->
    frame(Rest, Frame + 1, Score + First + Second).

 %% spare & strike bonus calculations.
spare_bonus([First|_Rest]) -> First.
strike_bonus([First,Second|_Rest]) -> First + Second.

Bringing it Home

The thing is, once I’d seen the solution in Erlang, I was able to go back and implement it in Python; it came out to roughly the same number of lines of code, and was about as readable. That transfers, that way of solving problems. Instead of thinking, “What are the classes I need to model this problem domain?” start with, “What are my inputs and outputs? What’s the end result I want, and what am I starting from? Can I do that with simple data structures?”

So now when I write Python code, I use list comprehensions a lot more; for loops feel kinda sketchy - clumsy and error-prone. Modifying a variable instead of creating a new one sets off this tiny warning bell. I use annotations and lambda functions more often, and wish I had tail recursion and pattern matching. I do more with lists and dictionaries; defining classes for everything feels like boilerplate.

In the last year, I’ve also done a bunch of rich browser client Javascript programming with jQuery and Backbone.js. That’s a very functional style of programming. It’s all widget callbacks and event handling - lots of closures. (I don’t know who originally said it, but Javascript has been described as “Lisp with C syntax”.) Actually, I was coding in Coffeescript and debugging in Javascript. Coffeescript is essentially a very concise and strongly functional macro language for generating Javascript. So it was a really good thing to have the experience with Erlang going into that.

Community

The other thing about foreign travel is the people you meet. I’d like to make a little plug for the Erlang community. It’s still small enough to be awesome. Just lurk on the erlang-questions mailing list, and you can learn a ton. There are some really sharp people on it, and the discussions are a fascinating mix of academic and practical. You see threads that wander from theoretical computer science to implementation details to performance issues.

Ah-ha! Moments

Like I said, Erlang has a different way of doing things. It’s not that it’s all that more complicated than other languages, but it’s definitely different. So I’m going to talk about some of the ah-ha! moments - the conceptual breakthroughs - that made learning it easier.

Syntax

I’ll start with the syntax, which is probably the least important difference, but it’s the first thing that people tend to get hung up on. They look at Erlang code, and they’re all like, “Where are the semicolons? What are all these commas doing here? Where are the curly braces?” It all seems bizarre and arbitrary. It’s not. It’s just not like C.

What helped me get used to Erlang’s syntax was realizing that what it looks like is English. Erlang functions are like sentences: You have commas between lists of things, semicolons between clauses, and a period at the end. Header statements like -module and -define all express a complete thought, so they end with a period. A function definition is one big multi-line sentence. Each line within it ends with a comma, function clauses end with a semicolon, and there’s a period at the end. case and if blocks are like mini function definitions: They separate their conditions with semicolons and end with end. end is the puctuation; you don’t put another semicolon before it. After end, you put whatever punctuation would normally go there.

You also have to realize that all the things you think of as control structures - case, if, receive, etc. - are functions.

Here’s a cheat sheet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-module(my_module).

my_func([]) ->
    Value = get_default_value(),          % comma
    Response = case other_func(Value) of
        ok -> "We're good!";              % semicolon
        _ -> "Oh noes!"                   % nothing!
    end,                                  % comma
    {Response, Value};                    % semicolon
my_func([Value]) ->
    {"We're good!", Value};               % semicolon
my_func(Values) ->
    IncDbl = fun (X) ->
        Inc = X + 1,                      % comma
        Inc * 2                           % nothing!
    end,                                  % comma
    Value = lists:map(IncDbl, Values),    % comma
    {"We're good!", Value}.               % period

Even with that, it’s still pretty idiosyncratic. You’ll find yourself making a bunch of syntax mistakes at first, and that’ll be frustrating. Let me just say that you’ll get used to it faster than you expect. After a couple weekends hacking on Erlang code, it’ll start to look normal.

Recursion

Recursion is not something you use much in OO languages because (a) you rarely need to, and (b) it’s scary - you have to be careful about how you modify your data structures. Recursive methods tend to have big warning comments, and nobody dares touch them. And this is self-reinforcing: Since it’s not used much, it remains this scary, poorly-understood concept.

In Erlang, recursion takes the place of all of the looping constructs and iterators that you would use in an OO language. Because it’s used for everything, there are well-established patterns for writing recursive functions. Since you use it all the time, you get used to it, and it stops being scary.

This is also where Erlang’s weirdnesses start working together. Immutable variables actually simplify recursion, because they force you to be clear about how you’re changing your data at each step of the recursion. Pattern matching and guard expressions make recursion more powerful and expressive, because they let you break out the stages of a recursion in a very declarative way. Let’s look at the basics of recursion with a very simple example: munging a list of data.

Like a story, a recursive function has a beginning, a middle, and an end. The beginning and end are usually the easiest parts, so let’s tackle those first. The beginning of a recursion is just a function that takes the input, sets up any initial state, ouput accumulators, etc., and recurses. In this case, we take an input list and set up an empty output list.

1
2
3
4
%% beginning
func(Input) ->
    Output = [],
    func(Input, Output).

The end stage is also easy to define. When the input is an empty list, just return the output list.

1
2
%% end
func([], Output) -> lists:reverse(Output).

The middle stage defines what we do with any single element in the list, and how we move on to the next one. Here, we just pop the first element off the input list, munge it to create a new element, push that onto the output list, and recurse with the newly-diminished input and newly-extended output. (And note that we add our new element at the beginning of the list, rather than the end - it’s an efficiency thing.)

1
2
3
4
%% middle
func([First | Rest], Output) ->
    NewFirst = munge(First),
    func(Rest, [NewFirst | Output]);

That’s all there is to the basics of recursion. You may have multiple inputs and outputs, and there could be multiple middle and end functions to handle different cases (and we’ll see a more interesting example in a minute), but the basic pattern is the same.

As a coda to this, it’s worth mentioning that this is essentially what Erlang’s lists:map/2 function does, so you could replace all the forgoing with something like:

1
lists:map(fun my_module:munge/1, Input)

The lists module has a number of other functions for doing simple list munging like this.

More OO than OO

The next thing is Erlang process spawning and inter-process communication. Again, this is one of those things that in normal languages is rarely used and fraught with peril. In Java, multithreaded applications involve a lot of painstaking synchronization, and you still often get bit by either concurrent modification errors or performance issues from overly aggressive locking. In Erlang of course, you do it all the time. Understanding why requires a bit of background.

The original concept of object oriented programming was that objects would be autonomous actors rather than just data structures. They would interact with each other by sending messages back and forth. You see artifacts of this like Ruby’s send method. (Rather than invoking a method directly, you call send on the object with the method name as the first argument.) In practice, objects in OO languages are little more than data structures with function definitions bolted on. They’re not active agents; they’re passive, waiting around for a thread of execution to come through and do something to them.

In a sense, Erlang is more truly object oriented than OO languages, but you come to it by a roundabout way. Since even complex data structures are immutable, updating your data always creates a new reference to it. If you pass any data structure to a function, as soon as it modifies it, it’s dealing with a different data structure. So the only way to have something like global, mutable data is to have that reference owned by a single process and managed like so:

1
2
3
4
5
loop(State) ->
    receive Message ->
        NewState = handle_message(Message, State),
        loop(NewState)
    end.

(You wouldn’t literally have code like this, but it’s conceptually what you’re doing.) State is any data structure, from an integer to a nested tuple/list/dictionary structure. You’d spawn this loop function as a new process with its initial state data. From then on it would receive messages from other processes, update its state, maybe send a respose, and then recurse with the new state. The key here is that it’s a local variable to this function; there’s no way for any other process to mess with it directly. If you spawn another process with this function, it will have a separate copy of the State, and any updates it makes will be completely independent of this. The simplest example I can think of would be an auto-incrementing id generator:

1
2
3
4
5
6
7
loop(Id) ->
    receive
        {Pid, next} ->
            NewId = Id + 1,
            Pid ! NewId,
            loop(NewId)
    end.

You could start it up and get new ids like so:

1
2
3
Pid = spawn(fun() -> loop(0) end),
Pid ! {self(), next},
Id = receive Resp -> Resp end.

So anything that would be an object in an OO language is a process in Erlang. I hadn’t realized quite how true that was until I was messing around in the Erlang shell, and opened a file. file:open/2 says it returns {ok, IoDevice} on success. Let’s take a look at that:

1
2
1> file:open("test.txt", [write]).
{ok,<0.35.0>}

Hey, wait! That’s a process id. See?

1
2
2> self().
<0.32.0>

So when you open a file, you don’t actually access it directly; you’re spawning off a process to manage access to it.

As with recursion and the lists module, Erlang has modules like gen_server and gen_event which gieve you a more formal and standard way to do this sort of thing. They add a lot of process management on top of this basic communication, so I won’t get into the details here, but check it out.

Getting Practice

Ok, so once you’ve gotten past the language concepts, how can you actually get some practice with it? Something a little more low-key than massively distributed high-availability systems?

Scripting

Probably the easiest way to start, if you just want to get comfortable with the language, is shell scripting. escript lets you use Erlang as a scripting language.

1
2
3
4
#!/usr/local/bin/escript

main(Args) ->
    io:format("Hello world!~n\t~p~n", [Args]).

That’s pretty cool. You have the ease of scripting, with full access to Erlang’s libaries. Furthermore, you can set a node name or sname in your script, and then it can connect to other Erlang nodes. (The special %%! comment says to pass the rest of the line through as parameters to erl, the Erlang emulator.)

1
2
#!/usr/local/bin/escript
%%! -sname my_script

For example, here’s a simple way to grab a web page:

1
2
3
4
5
6
#!/usr/local/bin/escript

main([Url]) ->
    inets:start(),
    {ok, { { _Proto, Code, _Desc }, _Hdr, Content } } = httpc:request(Url),
    io:format("Response (~p):~n~s~n", [Code, Content]).

That’s actually pretty handy because you can fetch data from web services that way. I started with this and built out a really simple automated testing tool for a web service I was writing, in about 20 lines of code. You can do all sorts of useful little things like this. They’re a good way to get used to Erlang’s idioms, and you can gradually build in more complexity as you go.

Testing Tools

In fact, testing tools are another way to get in some real experience with Erlang. You could do something simple to test web service functionality, or something more complicated and concurrent for load testing.

You could also mock out back-end web services for testing. I was doing some browser-side Javascript development last summer, and didn’t have access to the server I’d be talking to. (It was running on an embedded device.) So I faked it up in Erlang with Spooky, which is a simple Sinatra-style framework. It went something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
-module(my_web_service).
-behaviour(spooky).
-export([init/1, get/2]).

init([])-> [{port, 8000}].

get(Req, [])->
    Req:ok("Default response");
%% http://localhost:8000/path/to/resource
get(Req, ["path", "to", "resource"])->
    Req:ok("Canned response for resource");
get(Req, ["path", "to", "other-resource"])->
    Req:ok("Canned response for other resource").

Web Apps

If you’re coming from a web background, that’s another good place to start tinkering with Erlang. Instead of trying to think up an Erlang project, just do your next personal web app in Erlang. Erlang has a range of web application frameworks, so you can decide how much of the heavy lifting you want to do. As you saw, Spooky lets you simple stuff easily, but it’s fairly low-level.

ChicagoBoss is a richer, Django-like framework. It has an ORM, URL dispatching, and page templates (with Django syntax, no less). Wait, Object-Relational Mapper? What’s that doing in a functional language? Yeah, ok, really they’re proplists with a parameterized module and a bunch of auto-generated helper functions wrapped around them. They’re still immutable; don’t freak out. More experienced developers may argue about whether that’s the right way to do things, but it certainly makes ChicagoBoss more beginner-friendly. It also gives you some enticing extras like a built-in message queue and email server. The ChicagoBoss tutorial is really concise and well-written, so I’ll leave it at that.

If you want to get into the nuts and bolts of proper HTTP request handling, take a look at WebMachine. Most web frameworks leave out or gloss over a lot of the richness of the HTTP protocol. WebMachine not only gives you a lot of control over every step of the request handling, but actually forces you to think through it. It’s not the most intuitive for beginners, but it’s an education.

Those are the ones I’ve played with a bit, but there are lots more.

Contributing

One of the things I’ve run across with these, as with most open-source tools, is that there are “opportunities to contribute.” We’d love it if all of our software tools worked perfectly all the time, but the next best thing is if the source is on GitHub. Working with Spooky, I tripped over an odd little edge case. It turned out to be a simple fix - half a dozen lines of code. I forked it, fixed it, and put in a pull request. Had a similar experience with the ChicagoBoss templating code. They were both tiny contributions, but you still get a warm fuzzy feeling doing that. Throw in a few extra unit tests if you really want to make the owners happy.

Even if you’re unlucky, and the code works perfectly, almost every piece of software out there could benefit from better documentation. Take advantage of your newbie status; write a tutorial. The people who wrote the software know it inside and out; it helps to have beginners writing for beginners. I can vouch that a great way to learn something is to try to explain it to someone else.

Adventure Awaits!

What I hope I’ve left you with is a sense that Erlang is worth learning in its own right, that it’ll teach you new things about programming that you can apply in any language; that while it’ll be strange at first, it’s totally learnable; and that there are any number of low-intensity ways to get started using it. Most importantly, though, I want to leave you with the sense that this is fun. Learning a new language, new problem-solving tools, new ways of expressing ideas, that’s all fun. You’ve got an adventure ahead of you.

Remedial Javascript

My background here is that I’ve worked with Javascript on and off for years, but I never actually wrapped my head around how its inheritance works until just recently. I started out doing little bits of UI bling, then moved onto dynamic forms and simple ajax requests (pre-jQuery), then fancier stuff with jQuery and friends. So I’ve been able to get a lot done. It’s only when reading something like Javascript: the Good Parts that I’d get this nagging sense that I was missing something fundamental. Lately though, I’ve started working with backbone.js, doing serious model-view-controller programming in the browser, and that nagging has become loud and persistent.

Part of the problem is that I’m coming from a Java background, and Javascript looks a lot like it. It looks like it has the Java-style class inheritance that I’m familiar with. It’s got syntax like:

1
var o = new Object;

So I instinctively think, “Great, Object is a class, and o is an instance of that class.” You can think that, and Javascript will mostly work the way you expect. You can write a fair amount of code believing that.

But it’s wrong.

Javascript has prototypal inheritance, not class inheritance. I knew that, and seeing this class-y syntax gave me the feeling of being lied to. Not a malicious lie, but a little white “I’m glossing over the details here” lie. And once I got into trying to create my own class hierarchy, or extending someone else’s, those details started to matter. Things just didn’t work quite the way I expected. Mostly, I’d be missing values that I thought I’d inherited from somewhere. Even then, I could figure out what had gone wrong and fix it on a case-by-case basis, but that made it clear that there was something important I really just didn’t understand.

I’ve read a number of books and articles that talk about Javascript’s prototypal nature, and how you construct and extend objects, but my sense of what was going on under the hood never quite clicked. So I finally did what I always end up having to do to make sense of some bit of programming weirdness: step away from the big program I’m working on, and sit down at a shell to run some little experiments. In this case, it’s Chrome’s Javascript console. (Lines with a > are what I type. Lines without are the console’s response.) Starting off with the previous example:

1
2
> o = new Object;
Object

Great, I’ve created a new Object. But what is this “Object” thing really?

1
2
> Object
function Object() { [native code] }

Wait, so Object is a function. Huh?

The deal is that new is what’s actually doing the heavy lifting here, creating a new object. The Object function is just filling in the details. There’s nothing magic about it. You could call new on any function, and you’d get a new object. If that function sets any properties on this, they’ll show up in it. If that function has a property named prototype, the new object will inherit properties from it. prototype should be an object, but since functions are also objects, you won’t get an error if you mess up.

For example:

1
2
3
4
5
6
7
8
9
10
> A = function () { this.kingdom = "Animalia"; }
function () { this.kingdom = "Animalia"; }
> B = function () { this.phylum = "Chordata"; }
function () { this.phylum = "Chordata"; }
> B.prototype = A  // wrong!
function () { this.kingdom = "Animalia"; }
> b = new B
B
> b.kingdom
undefined

Here, everything looks fine until you try to get b.kingdom. The trouble is that kingdom is not a property of A; it’s just a property that A sets on this.

The right thing would be:

1
2
3
4
5
6
7
8
> a = new A
A
> B.prototype = a
A
> b = new B
B
> b.kingdom
"Animalia"

Now, any properties you add to a will be inherited by b:

1
2
3
4
> a.class = "Mammalia"
"Mammalia"
> b.class
"Mammalia"

But the properties that B set on b override a’s properties:

1
2
3
4
> a.phylum = "whatever"
"whatever"
> b.phylum
"Chordata"

b doesn’t inherit properties from B:

1
2
3
4
> B.order = "Carnivora"
"Carnivora"
> b.order
undefined

And b keeps its relation to a even if B changes its prototype

1
2
3
4
> B.prototype = {}
Object
> b.kingdom
"Animalia"

So that’s what it does, but what are the relationships between all these objects and functions?

1
2
3
4
5
6
7
8
9
10
> b instanceof B
true
> b instanceof A
true
> b instanceof a
TypeError: Expecting a function in instanceof check, but got #<error>
> b.__proto__ === a
true
> B instanceof A
false

So we have this odd sort of dual inheritance going on. b is an instance of B, and has a prototype of a. The instanceof relationship is purely historical: Changes to B don’t affect b after its construction. The prototype relation is ongoing and dynamic. Changes to a’s properties show up in b (unless b overrides them). Perhaps even more oddly, b is an instance of a’s constructor, but there’s no direct connection from B to A, only through a. It looks like this in my head:

  B   A
 / \ /
b---a

In short, an object inherits a type from its constructor and behavior from its prototype. In a class-based language, an object gets both of these from its class, but in Javascript, “What is it?” and “What can it do?” are different questions with different answers.

If you got all the way through this article and this stuff still doesn’t make sense, grab a Javascript console and try it out yourself. Work through the examples. Type them in by hand; don’t copy-paste them. (Seriously, that makes a huge difference.) Ask your own questions, come up with your own experiments. Tinker.

Remedial CSS

In my defense, let me point out that I’ve always been a back-end developer, and when I have had to do web front end stuff, it’s usually been something deliberately simple, for the lowest common denominator of browsers. So it’s only been in the last couple weeks at $new_gig that I’ve had to sit down and really understand all the clever stuff you can do with CSS layout.

It seems like it should be pretty straightforward: you basically have block, inline, float, and relative and absolute positioning; they can have fixed or percentage widths. But these sometimes interact in surprising ways, and I’d find that a common-sense design goal would be difficult-to-impossible to implement. The official W3 formatting model doc is thorough but it’s a lot to wade through. It also documents each feature in isolation, without much explanation of how they interact.

So I worked up a kind of cheat sheet, that shows what happens when you throw all this crap together on the same page. To really get the most out of it, you should bring up Firebug or the Chrome inspector, and play around with the width, position and other style settings.

Let me know if it’s useful, or what I’m missing.

How Steve Jobs Got Me to Buy an Android

Back around Christmas, I saw an old video of Steve Jobs at the ‘97 Apple World-Wide Developer Conference. This was just as he was coming back to Apple, so he’s talking a lot about their new direction. Part of that came out of his experience at Next. Next was all unix under the hood (or unix-y), so their systems were networked in a way that Macs and PCs just weren’t at that point. Jobs is talking about how all of his stuff just lives on the network: His machine at work, his machine at home, and any corporate machine he logs into all have equally easy access to the same files. He doesn’t have to worry about backup and recovery; that’s all taken care of by sysadmins. This is back when you’d usually just copy a few critical files onto a floppy disk and hope your hard drive didn’t crash. He was living in the future, and his vision was to simply make that available to everyone. As Willam Gibson pointed out, “The future is already here; it’s just not evenly distributed.”

It struck me while I was watching this that back then, I was also living in the future. I was working at an ISP, so I had network access that was years ahead of its time. Back when 28.8K dialup was the norm, my work computer had a 10MB connection straight into the Internet backbone. Even more importantly, it was always on; it was just there. Individual machines were expendable; it was the network and the data that mattered. That changed the whole way I worked with computers.

It also changed the way we played and socialized. We played net-Quake with imperceptible lag. We had a private mp3 server with thousands of tracks before most people knew what mp3s were. We chatted on IRC all day with our co-workers and similarly wired sysadmin buddies around the country (or world in a few cases). We could drag in a scratch-build Linux box and set it up as a public server: Give our friends email accounts and web sites and bring the future a little closer for them.

To be honest, I was never one of the ringleaders, the early adopters. I didn’t rush out to buy the latest gadget. I spent a fair amount of time tinkering with my home Linux machine, but I wasn’t really pushing the envelope. But I spent all my time around folks who were, and I was conscious that I was getting a sneak preview of the future. They were living and working the way that everyone would once all this tech got cheaper and easier to use.

In the years since, I haven’t really been in that sort of environment, and without it for balance, my skeptical tendencies took over. Or maybe I just got tired of having to rebuild my kernel to get sound working. In any case, I pretty much stopped tinkering and focused on The Simplest Thing that Works. I started buying Macs and their attendant accessories: The iPod that auto-synched my music, and the Time Capsule that did automated backups.

I didn’t get an iPhone, though. I had a pre-paid cell phone that you could buy at 7-11 for $20, and cost me $80 a year. The iPhone was nearly that much a month. It had a lot of nice-to-haves, but nothing that justified the cost. It’s actually exciting to me that perfectly serviceable technology is that cheap. Ditto for computers: As long as you’re not gaming, a $500 machine is plenty. (I’m writing this on a sub-$300 netbook. It’s text. How hard is that?)

Listening to Steve Jobs reminded me of that feeling of living in the future. It also struck me that that’s supposed to be part of my job - maybe not my day job, but some bigger social role as someone who understands machines and isn’t afraid to tinker with them. I shouldn’t be just a technology consumer. I should be bushwhacking my way into the future, cobbling together half-working prototypes to see what it’s like to live with them; figuring out how the tech works and how to polish it up and make it usable for everyone else. I’m no visionary, but there’s a lot of work to be done out on the frontier, just making things a little more civilized.

The catch is that that’s not what Apple is about. They build sleek, elegant, easy to use gifts from the Future. It all Just Works. That’s great, and I seriously applaud them, but you don’t learn much from a working machine. If you want to do some exploring on your own, and maybe figure out something useful that hasn’t already been productized - or to just pop the hood and get a better understanding of what makes this thing tick - you need something a little more open. You even kinda want something that doesn’t work quite right, something that’ll bug you to go in and fix it yourself. That’s just not how Apple wants you to relate to their technology.

So in the end, I got an Android phone. I can’t justify it as a phone, but I was able to rationalize it as a development machine. It’s got its own restrictions, and I’ve been too anxious to root it, but I’m still more comfortable with it than I would be with an iPhone. It plays well with Linux. I can develop in Eclipse on any platform; I’m not locked into the Mac/Xcode tools. Maybe what it comes down to is that I trust developer communities more than any single corporation.

I went through the standard Android programming tutorial, where you build a little notepad app. Then I hacked around on it a bit: added tagging, tweaked the page flow. It’s still rough around the edges, but it works. There are a few features I’d like to add, but I can do that. That’s the important thing. It’s not awesome, but it’s mine. I can keep sanding away at the things that bug me, and it may eventually become pretty awesome. It’ll be tailored to the way I use it. It’ll do the things I need it to, and it won’t be cluttered with features I don’t want. Nobody will be trying to get me to upgrade to the pay version.

In a very real way, I also need to do this to survive as a programmer. I need to keep that love of tinkering alive. If it’s just a day job, I don’t see how I can keep doing it for another twenty or thirty years. It needs to be more than that. I have to find that passion and the sense of something bigger. I need to care about it.