Evaluating proxy engines and load balancers for mongrel-driven ruby on rails applications: an introduction and an open call

Zed Shaw’s mongrel “is a fast HTTP library and server for Ruby that is intended for hosting Ruby web applications of any kind using plain HTTP rather than FastCGI or SCGI.”

And saying that it’s “fast” is true. The performance you get from a single mongrel process listening on a port is quite good. You see how such a benchmark relates to your network traffic in an older post of mine.

For example, on a SunFire x4100, with dual Opteron 285s (one of the standard container servers; the 285s are dual core opterons) running Solaris and with 16GBs of RAM.

$ uname -a
SunOS 69-12-222-41 5.11 snv_45 i86pc i386 i86pc
$ prtconf
System Configuration:  Sun Microsystems  i86pc
Memory size: 16256 Megabytes

A simple “Hello World” rails app will serve at 250 req/sec just fine over a gigabit network (I wasn’t trying to push it and involving a database isn’t the point yet).

[benchmark-client1:/] root# httperf --hog --server 69.12.222.41 --uri /hello --port 8000 --num-conn 10000 --rate 250 --timeout 5
httperf --hog --timeout=5 --client=0/1 --server=69.12.222.41 --port=8000 --uri=/hello --rate=250 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
	

Total: connections 10000 requests 10000 replies 10000 test-duration 40.041 s

Connection rate: 249.7 conn/s (4.0 ms/conn, <=26 concurrent connections)
Connection time [ms]: min 3.4 avg 20.7 max 114.2 median 14.5 stddev 18.0
Connection time [ms]: connect 0.7
Connection length [replies/conn]: 1.000

Request rate: 249.7 req/s (4.0 ms/req)
Request size [B]: 68.0

Reply rate [replies/s]: min 247.2 avg 249.7 max 250.4 stddev 1.0 (8 samples)
Reply time [ms]: response 19.9 transfer 0.1
Reply size [B]: header 251.0 content 21.0 footer 0.0 (total 272.0)
Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0

And you can see it work away at it (this is also doing sessions the slow /tmp way)

PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 25833 jason       26M   22M cpu0     0    0   0:00:46  11% mongrel_rails/1

But.

Now with mongrel. It’s all about two things then:

1) How fast is a load-balancing proxy engine?
2) How scaleable is a load-balancing proxy engine?

There’s a difference between the two but I haven’t seen a proper treatment of the second one and let me tell you what I mean.

It’s the same story with FCGI by the way.

Did I ever tell how I ran Alistapart.com as 15 lighttpd processes with 4 rails-FCGIs each and that’s how we got 2000 requests/second on a single server when most of the world showed up and read Jeffrey’s Web 3.0 article? Seriously the article was slashdotted, dugg, reddit’ed, blogged about, all at about the same time. And the interface between lighttpd and that app’s rails-fcgi seemed to max out at 200 req/second. So the solution? The box was fine, so run 15 of them! Worked swimmingly.

It’s the same way with anything where you’re connecting one tier to the next: the speeds of, how far you can spread each tier and the connections between them are important.

Recently I was reading Ezra’s article about nginx proxying to mongrel and yes while it’s fast proxying from benchmarker -> web server -> mongrels on one’s laptop, the question of how scaleable the proxy is and how to match it with hardware (so you waste nothing), is not addressed (and usually isn’t).

For example, in another article we see individual mongrels performing at 581 req/sec and then when five of these are put behind a single nginx it outputs at 956.99 requests/second (let’s round that up to 957 req/sec).

If I can hit all five mongrels at the same time from five benchmarking clients and get ~500 req/second each on a single application server (I’ve done this sort of thing, it works), then when I put five of these behind a proxy/load balancer, hit it 5x harder from the benchmarking clients, shouldn’t I get 2500 requests/second?

You typically do not.

I should if it’s predictably scaleable and I have enough power to generate that traffic (and my database isn’t limiting me yet).

[It’s always fun to see what web server/proxy engine someone will dig up next, if you look at one of the first articles talking about using alternative web servers for Rails-FCGI (with a focus on lighttpd) the list of light web servers and FCGI supporting web servers is still relevant.]

We’re fortunate in having a decent amount of standardized testing hardware (it’s from my former life as a “real” scientist) and this is a question that we’ve addressed internally a few times, but it’s still been a bit too ad hoc.

So I thought how about a comprehensive, tightly controlled, well done series of experiments under realish conditions. Yes that means statistics will be involved and the data set will have power, and realish conditions means multiple benchmarking client servers (about 20), a stack of real app and web servers, and gigabit interconnects. I’ll use tsung and httperf for the benchmarking.

We also have BIG-IP 6400s

and Zeus’s software based load balancer and traffic manager.

Is it fair to compare open source, free load-balancing proxy engines to hardware load balancers that are $50,000/each?

You bet (there’s relevant material at the end).

So the list I have (and would really appreciate any other suggestions) is

  • Perlbal
  • Pound
  • Pen
  • HA-Proxy
  • Nginx
  • TCPBalance
  • Balance
  • Balance NG
  • TCPFork
  • Apache 2.2
  • Litespeed
  • Lighttpd
  • Zeus load balancer
  • Zeus High-performance traffic manager
  • BIG-IP 6400
  • IPF
  • Squid
  • Varnish

These are all software-based and will initially be run on Sun Fire X4100 and X2100s (same chip, 1 vs. 2), except for the BIG-IPs, which are their own piece of hardware, and will be talking to backend Rails application servers, which will also be X4100s. I’ll also be profiling these to see which would be appropriate for running on the T1000s. Web servers like yaws are excluded because they don’t load balance reverse proxy requests.

But realize that when you look at something like Brad Fitzpatrick’s presentation, “LiveJournal’s Backend: a History of Scaling” (you can get the pdf), you’ll see that the flow goes Dual BIG-IPs -> perlbal -> mod_perl and that’s for a reason.

A quick example of what a hardware load balancer does when in front of a single Sun Fire application server

So it was easy to do 3000-4000 new connections/second, to sustain ~8000 active connections, and to output 50 -> 180 Mbps of traffic.

27 Comments on “Evaluating proxy engines and load balancers for mongrel-driven ruby on rails applications: an introduction and an open call

  1. Can’t wait. 🙂 We’ve been having some luck with Pound.

    I think you should also address the ease of configuration and management for each of these. For most of the world, the difference between 2500/reqs/sec and 2600/reqs/sec isn’t going to matter—beyond a fairly low threshold the bottleneck shifts. They just need something that is “fast enough.” Your experience there would be a great complement to the objective, numbers-based portion of the upcoming article.

    (Also, thanks for continuing to geek out about this stuff. It makes for awesome reading.)

  2. Can’t wait also. We’re running the site we’re building on Apache 2.2 proxying to a mongrel cluster on one the Textdrive containers now. I’m interested in seeing some other options.

  3. Justin, I’m doing the same setup. Would another proxy be better? I’ve got multiple domains and I need to do SSL as well.

  4. Yes please do this. It is much needed as there are many options that will work as front end to a proxied cluster of mongrels. But we do need to shake out the wheat from the chaffe and see which ones are the real deal. Let me know if you would like any help along the way or if there is anything else I can do to help.

  5. Yes good suggestion Wes, so something like ipfilter. The problem there is that (from what I recall) it requires a separate heartbeat function to monitor the backends. I’d like the engine to take them off line, but I’ll add ipfilter.

  6. Looking forward to this and I second the LVS idea. Jason, I don’t belive that you need a seperate heartbeat for the backend. The heartbeat is used only for high availability on the Load balancer and not the application / service cluster.

  7. while at linux/nat/lvs/ipfilter……why not give a look at openbsd+pf+carp? =)Just for the sake of it, don’t flame me =)http://www.countersiege.com/doc/pfsync-carp/http://www.openbsd.org/faq/pf/carp.html

  8. My GOD! I can only hope to have at least one such person on the QMD team. I’ve no idea what was being said here, but I love it all the same. I am thankful you’re there at Textdrive and glad to call it home now.

    Going back to design now knowing the hard issues are in good hands…

  9. ok so what about ipf+ucarp under solaris? =)I wondered about this configuration for a while but found nothing online about doing this… maybe a solution based on something more stable/tested is better… :/

  10. James: thanks for the squid suggestion.

    And I’ve run some (like the erlang based one) on Niagra. But because there are least intel-based, x86 hardware load balancers in the mix (the BIG-IPs), I’m going all x86 for the first round.

  11. I’m already 98% sure the our big-ips and zeus will “win” this, but winning isn’t what this is about (because the “entry” level there is $35K-$100K).

    Serially planning and scaling, and figuring out what goes well in front of the application is what it’s about.

  12. The latest commercial proxy caches will almost certainly crap all over the open source solutions; primarily because noone seems willing to spend ~$50k on a risky development project (and open source it!) over buying an existing product. Not that I don’t blame them!

    Check out “Varnish”, it might work out for your particular load.

  13. Did you try HaProxy ?It’s a monoprocessus event-based high availability load balancer developed by the current maintainer of the 2.4 branch of linux kernel. I’m pretty sure you’ll get good result with it.

  14. a litte off topic butwe use two dell server with ubuntu server, lvs and keepalived for our mail cluster.

    keepalive does ha for the loadbalancer and monitoring the cluster nodes. it works nice and the configuration is done in only one file.

    for the postfix nodes we use freebsd and the two mysql server again with ubuntu server.

    the storage is a ibm n3700 A20 (same as netapp fas270) dual head.

    lvs and keepalive is a realy nice combination, you can even sync your connections with the hot standby loadbalancer.

  15. Being the author of HAProxy , I’m interested in this test. I can provide help on the tuning if needed, and I suggest playing with the congestion control mechanism which can in fact improve performance even on a single server and reduce overall response time.

  16. Varnish is a brand new open source reverse proxy funded by norway’s largest newspaper “VG”. It made them replace 8 servers running squid with one server running Varnish. Actually the hardware running Varnish is second-hand, costed them ~US$1500 and is serving 10,000 req/sec at half load.

    It is using kernel features from either FreeBSD or Linux 2.6, so you might be out of luck on SunOS.

  17. Good luck with the benchmark. We’d be glad to support you in this – please get in touch. There are a number of performance and app acceleration papers published at http://www.zeus.com/news/white_papers.

    For the record, the ZXTM software starts at less $6K (for the ZXTM LB version, 4 IPs), so the entry level is a lot less than $35K.

    Strictly speaking, publishing benchmark figures is against our EULA, but we readily negotiate exemptions. The clause is just to avoid bad information from incorrectly run benchmarks (yes, they do happen!).

  18. We’d be happy to loan you one of our LVS based load balancer appliances(RRP $2,795)for the test, we’d love to rough up Zeus’s ZXTM in public and our EULA definately allows benchmarking :-).We also recommend Direct Routing or DSR for the best performance.Or is this a proxy only test?Product details here: http://www.loadbalancer.org/

  19. Just out of curiosity, do you do anything to monitor the end-user response time? How would you measure the impact of the acceleration features on the BIG-IPs? I’d really like to know.

%d bloggers like this: