Triple Parity Raid

In an effort to catch up on links.

Adam talks about triple parity RAID (raidz3) in an ACM queue article.

When RAID systems were developed in the 1980s and 1990s, reconstruction times were measured in minutes. The trend for the past 10 years is quite clear regardless of the drive speed or its market segment: the time to perform a RAID reconstruction is increasing exponentially as capacity far outstrips throughput. At the extreme, rebuilding a fully populated 2-TB 7200-RPM SATA disk—today’s capacity champ—after a failure would take four hours operating at the theoretical optimal throughput. It is rare to achieve those data rates in practice; in the context of a heavily used system the full bandwidth can’t be dedicated exclusively to RAID repair without adversely affecting performance.

Fifteen years ago, RAID-5 reached a threshold at which it no longer provided adequate protection. The answer then was RAID-6. Today RAID-6 is quickly approaching that same threshold. In about 10 years, RAID-6 will provide only the level of protection that we get from RAID-5 today. It is again time to create a new RAID level to accommodate the realities of disk reliability, capacity, and throughput merely to maintain that same level of data protection.

You’re Doing it Wrong by PHK

PHK’s You’re Doing It Wrong

Think you’ve mastered the art of server performance? Think again.
Would you believe me if I claimed that an algorithm that has been on the books as “optimal” for 46 years, which has been analyzed in excruciating detail by geniuses like Knuth and taught in all computer science courses in the world, can be optimized to run 10 times faster?

Later in the article, PHK has a key example with Varnish’s memory management. In short, it doesn’t. It takes advantage of the fact that it’s on a great kernel that already does this for it. Far too often, we’re making userland software that behaves as if there is nothing underneath and instead strives to do everything. The result? Massive inefficiencies and performance far less than you should be getting.

On Cloud Standards, Transparency and Data Mobility

I was on a panel last week talking about the role of infrastructure and “The Cloud” in online gaming (and I’m talking “fun” games, like Farmville, not online gambling).

One of the questions was “What do you think about cloud interoperability and standards?”.

To which I asked, “What do you mean?”

“Well, what do you think about API standards and the like?”

To which I replied, “Completely uninteresting.”

Now I know that at first read, it sounds like I’m saying to forget “standards” and to forget “interoperability”, but I’m not. It’s just that most of the current conversations about it are uninteresting. And uninteresting in the sense that I’m not convinced there is even customer pain, I’m not convinced that having to tool around different APIs that only currently accomplish provisioning is that difficult (remember the great thing about it is generally that it takes <30 minutes to understand how to do things and get going). In the case of virtualization, many use libvirt and that’s what you generally do in programming: interoperability comes in the form of a library or middleware generated by producers and real users and not design by committee. I expect to see more of these types of projects emerge.

Beside the fact that one’s application shouldn’t have to be aware that it’s no longer in your datacenter and it’s now “in the cloud”, I’m not even sure what most of the current standardization discussions (many seem focused around provisioning APIs or things like “trust” and “integrity”) would enable start-ups, tool vendor adoption, ISV adoption and an “ecosystem” to emerge in the grand scheme of things. I don’t think that these are the main problems that limit adoption.

And what are the real problems where interoperability and standardization is important? I think in data mobility and transparency.

Data mobility?

Let’s only talk about mobility at the VM level. If I create an AMI at Amazon Web Services and push it into S3, I can use that AMI to provision new systems on EC2, but for the life of me, I can’t find the ability to export that AMI as a complete bootable image so that I can run it on a local system capable of booting other Xen images. If you have a reference for this, please send it my way.

The same goes for Joyent Accelerators. We don’t make this easy to do. We should.

Transparancy?

Now this is where I think things get good. And where standard data exchanges around what our “cloud” is doing and whether it has the capacity to accomplish what a customer needs it to. In a previous post, I said

The hallmark of this “Cloud Computing” needs to be complete transparency, instrumentability and while making it certain that applications just work, the interesting aspects of future APIs aren’t provisioning and self-management of machine images, it’s about stating policies and being able to make decisions that matter to my business.

The power of this is that it would actually enable customers to get the best price at the best times, know that they’re moving an application workload to somewhere that will actually accomplish it and it is required for there to be the computing equivalent to the energy spot market.

I’d like to hear from our readers to please. What are the current “standardization” efforts that you thing are going well and might be interesting? Any realistic ones? Which ones are boiling the ocean?

The “Cloud” is supposed to be better than the “Real”

In my weekly reading of posts around this mighty collection of tubes, pipes and cans connected by shoestrings, the thing most call The Internets™, I came across “Why we moved away from “the cloud” to a “real” server”  from the fellows at Boxed Ice. They have a server metrics and monitoring service named Server Density.  Their exodus from a small VPS to collection of VPSs (What is the plural anyway? If Virtual Private Server is VPS, then Virtual Private Servers is VPSs?) is typical of a service that’s starting out and doesn’t have the in-house expertise and capital yet to go out and start building everything yourself (which is fine, that’s not a value statement, I’m simply saying that it’s a common path to go shared hosting to VPS to Managed Hosting to entirely DIY).

What I don’t like is the title. They moved from “the cloud” to the “real”.

To be exact, it was easier for them to get a “box” from a managed hosting provider with an NFS or iSCSI mount then it is to take the additional effort in configuring and managed EC2 images and the Elastic Block Store (EBS). They “would have had to build our own infrastructure management system” in order to get Amazon Web Services to do exactly what they wanted.

That’s entirely correct. A precise title would therefore be “Why we moved from two VPSs to some servers at Rackspace instead of getting into the morass of managing EBS plus EC2 and probably spend more money while we’re at it”.

I admit, it’s a longer, more awkward title but it doesn’t use a generic term for what is a pretty specific complaint.

I want to make really clear at this point that I think they’re entirely right, and everyone in the “cloud industry” should think to themselves “Am I writing the correct software that would make such commentary a thing of the past?”.

What then is the cloud? And really why did the “cloud” fail them at this stage in the game?

In my opinion, “Cloud computing” is a software movement.

Software requires a hardware platform, and the hardware platform must be scalable, robust and able to either directly handle a workload or be a valid part of partitioning that workload. At Joyent, we’re pushing our physical nodes to the point where they’re simply not normal for most people: base systems with 0.25 to 0.5 TBs of RAM, more spindles than a textile factory and multiple low latency, 10 Gbps out of the back (looking back to 6 years ago, it’s amazing what you can get nowadays for about the same price) . These then become the transistors on the Datacenter Motherboard, and just like how on a current “real” server, many of the components are abstracted away from us (we deal with a “server” and an “operating system”), those of use writing “Cloud Software” need to abstract away all the components of a datacenter and have everyone deal with telling the “cloud” what they want to accomplish. Taking desired performance and security into account, the ultimate API or customer interaction is “I want you to do X for me”.

What is X? Some examples,

“I want this site to be available 99.99% of the time to 99.99% of my end users. How much is that?”

“I have this type of data, it needs to be N+1 redundant, writes have to happen within 400 ms and reads within 100ms 99.99% of the time. How much is that?”

I could go one with a series of functional use cases like this where the user of a “cloud” is asking to do exactly what we typically want an entire infrastructure to do (not defining the primitives etc), where the user is telling the cloud how important something is to them and where the user is asking for an economic value/cost to be associated with it.

The “How much is that?” should always return a value that is cheaper than “real” and cheaper than DIY.

That’s something that’s different.

That’s something that’s an improvement on the current state of computing. It’s not a horizontal re-tooling of the current state of affairs.

That’s something that would take us closer and closer to providing products to growing companies like Boxed Ice where they start out really small on the “cloud” and seamlessly scale to a global scale when they need it, and the “cloud” becomes “real”.

100,000 Joyent Accelerators

We just delivered the 100,000th Joyent Accelerator to a customer. That’s a big milestone. Congratulations to the Joyent team. And congratulations to our customers who are doing such interesting things with Joyent Accelerators, everyone from Prince (the artist known as), to all the Facebook developers, to the many enterprise shops removing the barriers of IT from the innovations of smart developers. Onwards to 1,000,000.

What I would want a “cloud” to do for me: a functional view

I was on a panel at Enterprise 2.0 yesterday about “Cloud Computing” providers and wanted to take the entire definite of “what is cloud computing?” from a different perspective.

As a consumer, I fundamentally want the entire technology stack for an application to

  1. Just work
  2. Just scale
  3. Just tell me everything

Just works means that I tell an infrastructure that I want my application to have user experience X 99.99% of the time to 99.99% of the my users, and what I want back from that query is a dollar amount. I’ll then make a decision about what I can afford.

Just scales means that I can go from zero to millions of users, one geographical site to twenty geographical sites, and one continent to four continents without a rewrite/rearchitecting.

Just tells me everything means that an API and data source is not only about provisioning machine images or the like, it can be queried for real time and time series data about everything. It let’s me make decisions around latencies and nothing ever just happened for no reason.

When you think about what it would take for these things to come to fruition, it starts to feel like a real step forward in the industry. It’s the collection of these things (in existence or development) that I don’t mind giving a new term to.