Erlang processes are human too

Via Ludo I came to read why processes scale better than threads (a topic that comes around every now and then).

But the case is really made in Joe Armstrong’s recent Concurrency is Easy where he does a great job connecting the philosophy behind erlang to common human experiences.

Each human is a process you see: we live, we die, we heal, we communicate, when in a group others notice if, what and maybe why something happened to us, sometimes they even pick up the slack so we can recovery, sometimes they put us out of our misery.

I’ve been a fan of Erlang since the end of the last millennium (we have a couple of infrastructure pieces written in it) and always flip through Dr. Armstrong’s slides when they come up. In fact, even the last slide in this pdf is quite timely and great advice (the slide is from 2001).

Evaluating proxy engines and load balancers for mongrel-driven ruby on rails applications: an introduction and an open call

Zed Shaw’s mongrel “is a fast HTTP library and server for Ruby that is intended for hosting Ruby web applications of any kind using plain HTTP rather than FastCGI or SCGI.”

And saying that it’s “fast” is true. The performance you get from a single mongrel process listening on a port is quite good. You see how such a benchmark relates to your network traffic in an older post of mine.

For example, on a SunFire x4100, with dual Opteron 285s (one of the standard container servers; the 285s are dual core opterons) running Solaris and with 16GBs of RAM.

$ uname -a
SunOS 69-12-222-41 5.11 snv_45 i86pc i386 i86pc
$ prtconf
System Configuration:  Sun Microsystems  i86pc
Memory size: 16256 Megabytes

A simple “Hello World” rails app will serve at 250 req/sec just fine over a gigabit network (I wasn’t trying to push it and involving a database isn’t the point yet).

[benchmark-client1:/] root# httperf --hog --server 69.12.222.41 --uri /hello --port 8000 --num-conn 10000 --rate 250 --timeout 5
httperf --hog --timeout=5 --client=0/1 --server=69.12.222.41 --port=8000 --uri=/hello --rate=250 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
	

Total: connections 10000 requests 10000 replies 10000 test-duration 40.041 s

Connection rate: 249.7 conn/s (4.0 ms/conn, <=26 concurrent connections)
Connection time [ms]: min 3.4 avg 20.7 max 114.2 median 14.5 stddev 18.0
Connection time [ms]: connect 0.7
Connection length [replies/conn]: 1.000

Request rate: 249.7 req/s (4.0 ms/req)
Request size [B]: 68.0

Reply rate [replies/s]: min 247.2 avg 249.7 max 250.4 stddev 1.0 (8 samples)
Reply time [ms]: response 19.9 transfer 0.1
Reply size [B]: header 251.0 content 21.0 footer 0.0 (total 272.0)
Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0

And you can see it work away at it (this is also doing sessions the slow /tmp way)

PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 25833 jason       26M   22M cpu0     0    0   0:00:46  11% mongrel_rails/1

But.

Now with mongrel. It’s all about two things then:

1) How fast is a load-balancing proxy engine?
2) How scaleable is a load-balancing proxy engine?

There’s a difference between the two but I haven’t seen a proper treatment of the second one and let me tell you what I mean.

It’s the same story with FCGI by the way.

Did I ever tell how I ran Alistapart.com as 15 lighttpd processes with 4 rails-FCGIs each and that’s how we got 2000 requests/second on a single server when most of the world showed up and read Jeffrey’s Web 3.0 article? Seriously the article was slashdotted, dugg, reddit’ed, blogged about, all at about the same time. And the interface between lighttpd and that app’s rails-fcgi seemed to max out at 200 req/second. So the solution? The box was fine, so run 15 of them! Worked swimmingly.

It’s the same way with anything where you’re connecting one tier to the next: the speeds of, how far you can spread each tier and the connections between them are important.

Recently I was reading Ezra’s article about nginx proxying to mongrel and yes while it’s fast proxying from benchmarker -> web server -> mongrels on one’s laptop, the question of how scaleable the proxy is and how to match it with hardware (so you waste nothing), is not addressed (and usually isn’t).

For example, in another article we see individual mongrels performing at 581 req/sec and then when five of these are put behind a single nginx it outputs at 956.99 requests/second (let’s round that up to 957 req/sec).

If I can hit all five mongrels at the same time from five benchmarking clients and get ~500 req/second each on a single application server (I’ve done this sort of thing, it works), then when I put five of these behind a proxy/load balancer, hit it 5x harder from the benchmarking clients, shouldn’t I get 2500 requests/second?

You typically do not.

I should if it’s predictably scaleable and I have enough power to generate that traffic (and my database isn’t limiting me yet).

[It’s always fun to see what web server/proxy engine someone will dig up next, if you look at one of the first articles talking about using alternative web servers for Rails-FCGI (with a focus on lighttpd) the list of light web servers and FCGI supporting web servers is still relevant.]

We’re fortunate in having a decent amount of standardized testing hardware (it’s from my former life as a “real” scientist) and this is a question that we’ve addressed internally a few times, but it’s still been a bit too ad hoc.

So I thought how about a comprehensive, tightly controlled, well done series of experiments under realish conditions. Yes that means statistics will be involved and the data set will have power, and realish conditions means multiple benchmarking client servers (about 20), a stack of real app and web servers, and gigabit interconnects. I’ll use tsung and httperf for the benchmarking.

We also have BIG-IP 6400s

and Zeus’s software based load balancer and traffic manager.

Is it fair to compare open source, free load-balancing proxy engines to hardware load balancers that are $50,000/each?

You bet (there’s relevant material at the end).

So the list I have (and would really appreciate any other suggestions) is

  • Perlbal
  • Pound
  • Pen
  • HA-Proxy
  • Nginx
  • TCPBalance
  • Balance
  • Balance NG
  • TCPFork
  • Apache 2.2
  • Litespeed
  • Lighttpd
  • Zeus load balancer
  • Zeus High-performance traffic manager
  • BIG-IP 6400
  • IPF
  • Squid
  • Varnish

These are all software-based and will initially be run on Sun Fire X4100 and X2100s (same chip, 1 vs. 2), except for the BIG-IPs, which are their own piece of hardware, and will be talking to backend Rails application servers, which will also be X4100s. I’ll also be profiling these to see which would be appropriate for running on the T1000s. Web servers like yaws are excluded because they don’t load balance reverse proxy requests.

But realize that when you look at something like Brad Fitzpatrick’s presentation, “LiveJournal’s Backend: a History of Scaling” (you can get the pdf), you’ll see that the flow goes Dual BIG-IPs -> perlbal -> mod_perl and that’s for a reason.

A quick example of what a hardware load balancer does when in front of a single Sun Fire application server

So it was easy to do 3000-4000 new connections/second, to sustain ~8000 active connections, and to output 50 -> 180 Mbps of traffic.

We’ll be in London in September for RailsConf Europe 2006

The talks schedule is out for the first Rails Conference in Europe and both Koz and I are giving talks.

I’ll be talking about Ruby On Rails Applications: A Systems View and Koz is called Playing Nice with Others .

My talk is related to the Scale with rails work and God knows what Koz’s about.

I’ll be putting a few days on both sides of the trip and want to make sure I’m able to buy each textdrive/joyent customer a pint or two or three.

I’m staying at the Radisson Edwardian Kenilworth Hotel and will in London from Wednesday the 13th through Sunday the 17th.

If anyone has some ideas let me know.