Zed Shaw’s mongrel “is a fast HTTP library and server for Ruby that is intended for hosting Ruby web applications of any kind using plain HTTP rather than FastCGI or SCGI.”
And saying that it’s “fast” is true. The performance you get from a single mongrel process listening on a port is quite good. You see how such a benchmark relates to your network traffic in an older post of mine.
For example, on a SunFire x4100, with dual Opteron 285s (one of the standard container servers; the 285s are dual core opterons) running Solaris and with 16GBs of RAM.
$ uname -a
SunOS 69-12-222-41 5.11 snv_45 i86pc i386 i86pc
System Configuration: Sun Microsystems i86pc
Memory size: 16256 Megabytes
A simple “Hello World” rails app will serve at 250 req/sec just fine over a gigabit network (I wasn’t trying to push it and involving a database isn’t the point yet).
[benchmark-client1:/] root# httperf --hog --server 184.108.40.206 --uri /hello --port 8000 --num-conn 10000 --rate 250 --timeout 5 httperf --hog --timeout=5 --client=0/1 --server=220.127.116.11 --port=8000 --uri=/hello --rate=250 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Total: connections 10000 requests 10000 replies 10000 test-duration 40.041 s
Connection rate: 249.7 conn/s (4.0 ms/conn, <=26 concurrent connections)
Connection time [ms]: min 3.4 avg 20.7 max 114.2 median 14.5 stddev 18.0
Connection time [ms]: connect 0.7
Connection length [replies/conn]: 1.000
Request rate: 249.7 req/s (4.0 ms/req)
Request size [B]: 68.0
Reply rate [replies/s]: min 247.2 avg 249.7 max 250.4 stddev 1.0 (8 samples)
Reply time [ms]: response 19.9 transfer 0.1
Reply size [B]: header 251.0 content 21.0 footer 0.0 (total 272.0)
Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0
And you can see it work away at it (this is also doing sessions the slow /tmp way)
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
25833 jason 26M 22M cpu0 0 0 0:00:46 11% mongrel_rails/1
Now with mongrel. It’s all about two things then:
1) How fast is a load-balancing proxy engine?
2) How scaleable is a load-balancing proxy engine?
There’s a difference between the two but I haven’t seen a proper treatment of the second one and let me tell you what I mean.
It’s the same story with FCGI by the way.
Did I ever tell how I ran Alistapart.com as 15 lighttpd processes with 4 rails-FCGIs each and that’s how we got 2000 requests/second on a single server when most of the world showed up and read Jeffrey’s Web 3.0 article? Seriously the article was slashdotted, dugg, reddit’ed, blogged about, all at about the same time. And the interface between lighttpd and that app’s rails-fcgi seemed to max out at 200 req/second. So the solution? The box was fine, so run 15 of them! Worked swimmingly.
It’s the same way with anything where you’re connecting one tier to the next: the speeds of, how far you can spread each tier and the connections between them are important.
Recently I was reading Ezra’s article about nginx proxying to mongrel and yes while it’s fast proxying from benchmarker -> web server -> mongrels on one’s laptop, the question of how scaleable the proxy is and how to match it with hardware (so you waste nothing), is not addressed (and usually isn’t).
For example, in another article we see individual mongrels performing at 581 req/sec and then when five of these are put behind a single nginx it outputs at 956.99 requests/second (let’s round that up to 957 req/sec).
If I can hit all five mongrels at the same time from five benchmarking clients and get ~500 req/second each on a single application server (I’ve done this sort of thing, it works), then when I put five of these behind a proxy/load balancer, hit it 5x harder from the benchmarking clients, shouldn’t I get 2500 requests/second?
You typically do not.
I should if it’s predictably scaleable and I have enough power to generate that traffic (and my database isn’t limiting me yet).
[It’s always fun to see what web server/proxy engine someone will dig up next, if you look at one of the first articles talking about using alternative web servers for Rails-FCGI (with a focus on lighttpd) the list of light web servers and FCGI supporting web servers is still relevant.]
We’re fortunate in having a decent amount of standardized testing hardware (it’s from my former life as a “real” scientist) and this is a question that we’ve addressed internally a few times, but it’s still been a bit too ad hoc.
So I thought how about a comprehensive, tightly controlled, well done series of experiments under realish conditions. Yes that means statistics will be involved and the data set will have power, and realish conditions means multiple benchmarking client servers (about 20), a stack of real app and web servers, and gigabit interconnects. I’ll use tsung and httperf for the benchmarking.
We also have BIG-IP 6400s
and Zeus’s software based load balancer and traffic manager.
Is it fair to compare open source, free load-balancing proxy engines to hardware load balancers that are $50,000/each?
You bet (there’s relevant material at the end).
So the list I have (and would really appreciate any other suggestions) is
- Balance NG
- Apache 2.2
- Zeus load balancer
- Zeus High-performance traffic manager
- BIG-IP 6400
These are all software-based and will initially be run on Sun Fire X4100 and X2100s (same chip, 1 vs. 2), except for the BIG-IPs, which are their own piece of hardware, and will be talking to backend Rails application servers, which will also be X4100s. I’ll also be profiling these to see which would be appropriate for running on the T1000s. Web servers like yaws are excluded because they don’t load balance reverse proxy requests.
But realize that when you look at something like Brad Fitzpatrick’s presentation, “LiveJournal’s Backend: a History of Scaling” (you can get the pdf), you’ll see that the flow goes Dual BIG-IPs -> perlbal -> mod_perl and that’s for a reason.
A quick example of what a hardware load balancer does when in front of a single Sun Fire application server
So it was easy to do 3000-4000 new connections/second, to sustain ~8000 active connections, and to output 50 -> 180 Mbps of traffic.