On Accelerators

Fundamental Philosophy and Origin

Accelerators rose from two needs: a standardized stack capable of serving our own growing applications and the appearance last year of large companies and startups needing “enterprise rails”.

Our applications, like Strongspace and the Joyent Connector are over 2 years old now, were some of the earliest revenue-generating Rails applications and have grown to cover a significant build out. There was a bit of learning how to do stuff, that’s applicable to anyone else. Like any infrastructure there’s a few boundaries (>10 servers, >100 servers, >1000 servers, >100 TB of storage) that when you’re crossing them, it’s critical to rigorously standardize. We found ourselves moving to one operating system (Solaris, we used BSD and Linux before), one type of server, one type of switch, one type of routers, one kind of ethernet cable, one kind of storage, one kind of hard drive, one kind of DRAM and one type of hardware-based load-balancers. This introduces needed predictability, makes automated management possible, makes virtualization easier, allows for systems level software development and makes all the parts interchangeable.

We were a little ahead of the curve in our community, and with a significant number of contracts coming our way, we productized the Accelerators last summer with two goals in mind: minimize the capital expenditures and worry around operations and facilities, and allow development teams and businesses to focus on and grow their applications.

We haven’t always done that perfectly, and we’ve made mistakes here and there, but every mistake has been noted, learned from and won’t be repeated. What we’ve found is an increasing common position that Allan Leinwand covered well in a recent article at a Gigaom and that is the perception that infrastructure is already a full utility (it isn’t but we’re trying to make it so) and that one “can deploy a wildly successful Web 2.0 application that serves millions of users and never know how a router, switch or load-balancer works.”

There then is a couple of key things that are needed beyond simply servers and network drops and these are load-balancers capable of handling not just significant traffic but also wide horizontal spread, and storage that is both resilient to all failures (doesn’t lose data. ever.) and scalable to 100s of TBs for a single customer (again, the first customer of that size storage was Joyent).

Load Balancers

We’ve been long-time users and fans of F5’s BIG-IP as a carry-over from when some of us were in “big enterprises”.

But there’s real technical reasons for using them, and to give you an example, a pair of the big BIG-IPs load-balances all of adobe.com, which is the place where everyone and their mother downloads Acrobat from (a popular download during the tax seasons), and they constantly run in their spec’ed range of 2-10 Gbps. That’s not a trivial amount of traffic, and there’s not a trivial number of backend servers either.

What makes them quite relevant to us is that it’s possible to put 300-400 mongrels behind a single floating IP address and watch as 20-30 connections get evenly distributed across them.

It’s possible to have the BIG-IPs force caching headers, force pipelining by making it appear that you’re using distributed assets on different hostnames, layer7 direct traffic so you can do things like separate out application servers for different controllers, different routes, and to differentially handle parts of your site (separate page views from API calls), and even load-balance rings of MySQL multi-master servers behind a single IP address.

Basically a lot of logic can be coded into them so they can accomodate applications that weren’t quite built to scale.

In my opinion, these capabilities are one the most discriminating features, and all things is a key part of a scalable stack.

Storage

The other is that there should be no worries, loss of data or loss in performance when any component of the storage fails or is unavailable for a time. A development team shouldn’t have to worry about catastrophic hardware failures. This is achieved with a fully redundant storage infrastructure: RAID6 across 9-14 drives, and network volumes coming up from dual trays and dual controllers via physical separate switches, cables and network interface cards. The only concern one should have is backups to protect yourself against accidental file deletion (like a migration gone wrong), and for this there’s point-in-time snapshots (from ZFS).

Pay For Idle or Fair Share?

An Accelerator is not “pay for idle” (but the tools are from a pay for idle world), and Solaris userland has to be BSDized or Linuxized so that there’s no barriers to adoption.

In a “pay for idle” system, you’re paying for CPU to sit when they’re doing nothing, what we do is allow people to burst up and use CPUs on a node as long as no one else needs them. When one does need CPU time and you’re the “non-bursting party”, it’s under a Fair Share Scheduling (FSS) algorithm, and usage for an application receives guaranteed minimums. The problem is that when navigating a on-top-of-the-OS virtualization such as Solaris zones, tools like “top” can still see kernel statistics and they report total CPU use. In some cases, this is disconcerting and a common point of discussion, but when an application needs CPU, it gets it, and in fact with FSS, an application that hasn’t been cranking away on a CPU should get an even greater priority. When you add the work Joyent is contributing such as automated migrations based on CPU loads in combination with load-balancing (and quality of service being managed there), and observability tools that correctly report what one is using and has available to them; one is on a wonderful middle ground between pay for idle and an intelligent CPU scheduler that’s aware of application loads. Joyent can begin to treat a farm of processors like a single processor, each operating system like a raft of processes on a unified operating system, each user process like an event in a grander process. It all balances.

BSD Userland

We’ve been working to move to a userland that’s very familiar for people coming from FreeBSD, NetBSD and Mac OS X. There’s very little taken away in Solaris, and a lot of new tools, and in a lot of ways we offer optimized binaries in combination with the normal paths to them that one would expect.

The first place the new userland is going to show up will be in the “new” shared.

“But never forget that you can only stumble if you are moving.”

-Richard P. Carlton, Former CEO, 3M Corporation

6 Comments

  1. Just a heads up: the “Pay For Idle or Fair Share?” section starts off kinda confusing. It doesn’t read as a complete sentence nor an idea, if you understand what I’m saying.

    Also the “BSD Userland” section seems to be incomplete. Jason starts talking about Solaris, but then the paragraph ends rather abruptly.

    These two sections just don’t seem to have been edited rather well.

  2. Like any infrastructure there’s a few boundaries (>10 servers, >100 servers, >1000 servers, >100 TB of storage)…

    Out of curiosity, does Joyent/TextDrive have over 1000 servers?

  3. What are the trade offs of fitting the BSD userland into the Solaris world? I’ll be one of those who eventually gets moved over to the “new” shared. Will there be things that are best done the Solaris way (or simply won’t work the BSD way), or are you confident that the FreeBSD -> Solaris migration will just work?

  4. Mark: Until the native Solaris userland itself gets a facelift, we’ll be using a “hybrid” approach. Which means we’re taking our BSD /usr/local from shared and porting it to run on Solaris. The end result should feel very, very similar to our FreeBSD shared hosts right now. Our goal is defintely to make the FreeBSD -> Solaris migration “just work”, yes. We’ll see how close we get when we open up for beta migrations soon. 🙂

Comments are closed.