Part 2, On Joyent and Cloud Computing “Primitives”

In the first part of this series I made a key list of some of the underlying ideas at Joyent, that we believe that a company or even a small development team should be able to:

  1. Participate in a multi-tenant service
  2. Have your own instantiations of this service
  3. Install (and “buy”) the software to run on your own infrastructure
  4. Get software and APIs from Joyent that allows for the integration of all of these based on business desires and policies.

And said

The successful future “clouds” have to be more accessible, easier to use and to operate, and every single part of the infrastructure has to be addressable via software, has to be capable of being introspected into and instrumented by software and this addressability means that one can write policies around access, performance, privacy, security and integrity. For example, most of our customer really don’t care about the details, they care in knowing what is capable of providing 99.99% of their end users some great experience 99.99% of the time. These concepts have to be bake in.

I continue to think that from a developer’s perspective the future is closer to the SMART platform where Ramin’s comment on an older Joyeur article about EC2 versus Accelerators is relevant, let me quote him:

Whoever has the fewest number of steps and the fastest build/deploy time is likely to attract the most developers. Whoever can show that the operating cost scales linearly with use will have developers casting flower petals in their path 🙂

As an app developer, I don’t care that it runs on Solaris, FreeBSD, or Mac-OS. I want it to work. I want an optimized deployment workflow and a simple way to monitor and keep things running.

That all said.

In the second part to this series I wanted to start talking about “primitives”. I’m saying “start” because we’re going to be going to be covering primitives over the next couple of posts.

I’m going to loosely define “Primitives” (now with a capital P) as of all the stuff underneath your application, your language and the specific software you’re using to store your data. So yes, we’re talking about hardware and the software that runs that hardware. Even though most Primitives are supposed to eventually be hidden from a developer they’re generally important to the business people and those that have to evaluate a technology platform. They are important parts of the architecture when one is talking about “access, performance, privacy, security and integrity”.

Previously, I’ve talked about a bit about Accelerators ( On Accelerators) and that fundamentally we deal with 6 utilities in cloud computing.

The fermions are the utilities where things take up space

1) CPU space
2) Memory space
3) Disc space

The bosons are the utilities where things are moving through space and time

4) Memory bus IO
5) Disc IO
6) Network IO

All of these utilities have physical maximums dictated by the hardware, and they have a limit I’d like to call How-Likely-Are-You-To-Do-This-From-One-Machine-Or-Even-At-All.

I’ll admit at this point of a particular way of thinking. I think “what is the thing?”, “how it is going to behave?”, “what are the minimums and maximums of this behavior?” and finally “why?”.

The minimum for us is easy. It’s zero. Software using 0% of the CPUs, 0 GB of memory, doing 0 MB/sec of disc IO and 0 Gbps of network traffic.

The maximums:

  1. Commercially available CPUs typically top out in the 3s of Ghz
  2. “Normal” servers typically have <128 GB of memory in them and the ratio of 4GB of memory per CPU core is a common one from HPC (we use this and it would mean that a 128 GB system would have 32 cores)
  3. Drives are available up to a terabyte in size but as they get larger you’re making performance trade-offs. And while you can get single namespaces into the petabyte range, even though ones >100 TB are still irritating to manage (for either the increased fragility of a larger and larger “space”, or the variation in latencies between a lot of independent “storage nodes”).
  4. CPUs and memory talk at speeds set by the chip and hardware manufacturers. Numbers like 24 Gbps are common.
  5. Disc IO can be in the Gbps without much of an issue
  6. For a 125 kb page with 20 objects on it, 1 Gbps of traffic will give you 122,400,000 unique page views per day and that in a 30 day month this is 3,672,000,000 page views. Depending on how much stuff you have going on, this basically puts you in as a top 100 web property. With the number of public website is ~200 million (source), being in the top 200 is what … 0.00001% of the sites?

As something to think about and as an anchor, I remember seeing a benchmark of a “Thumper JBOD” attached to a system capable of saturating the 4×10Gbps NIC cards in the back of it. Yes the software was special, yes it was in C, and yes it was written with the explicit purpose of pushing that much data off of discs; however, think about that for a minute.

Imagine having a web property doing 120 billion monthly page views coming off of a single “system” that you can buy for a reasonable price. Starting from there, expand that architecture and I wonder with the “right software” and “primitives” where you would end up. If we change it from a web property to a gaming or a trading application, where would you end up? What is the taxonomy of applications out there (common and uncommon) and do we come up with the best architectures for each branch and leaf on that tree?

Please think about that anchor and a taxonomy for a few days and then I’m going to get into some of the key differentiators of our Primitives and answer some of the “Why?”.

2 responses to “Part 2, On Joyent and Cloud Computing “Primitives””

  1. I think there is alot of money to made in the SMART/Google Apps/ Aptana space by the people who supply such services. Because it completely removes the ability of the developer to fine tune, optimize, and make his application efficient. Lazy developers with more money than brains or experience flock to these kinds of services. But no developer/architect worth his salt will touch them, the cost is too high, the control to little. They are great for the newbie that is learning and bad for everybody else. This in turn makes the service provider quite rich as he’s hosting tons of badly written applications that consume far more resources than they need and is solving the problem by automatically throwing more hardware at bad code. More hardware = more profit and if you’re smart and use a thumper that mean less hardware can handle more bad code.

    Thanks guys but how about bringing pkg-src to within 6 months of current and making sure if packages are in it they actually work? Making it so I can get an Accelerator up in under 24 hours ( ALL your competitors can do it in under 2 minutes ). Writing some documentation on the knowledge base?

    Yes Yes I know cloud computing will make it so you don’t have to do any of that, “Just push a button and your application will magically run. No need to worry about whether the right libraries are installed, or whether software will compile, it’ll all just run, like magic.” If there is anything you should already know about developers, it’s that we HATE magic.

    Don’t get me wrong, you’ve some amazingly talented staff. You just need more of them and less of this prattling on about how you’re hardware infrastructure is well suited to build a “Pure Cloud”. I know you’re hardware rocks that’s why I spend money with you. But frankly most everything above the hardware layer is a mess. Focus on fixing that, then you can go back to day dreaming.

    Man I guess I’m feeling pretty frustrated. Oh well it’s all true even if it’s not the right place to post, so I’ll post anyway.

  2. Hi Vance, first this fine venue for venting and thank you for doing it.

    1) The SMART space doesn’t have to be high priced, and the same argument about control could be made for why we should code everything in assembly.

    2) Completely agree on getting our pkg-src builds from their 6 month cycles to the quarterly builds and that’s going to happen. Also on spinning up new accelerators and documentation, we’re investing a lot in these.

    3) By magic, you likely mean something that happens where you can’t instrument it, can’t debug it, can’t introspect into it. We completely agree on this. One can still do something practical, great, innovative and once you can introspect into it and know everything that’s going on, the idea that it’s “magic” should disappear. Are standard UNIX APIs magic?

    4) This isn’t about day dreaming or prattling. It’s a blog, where I’m blogging, it’s not roadmaps or press releases or anything like that.