There’s a lot of talk lately about “grids”.
The word “grid” has reappeared in marketing materials and we’ve seen it brought up during the the emergence of companies offering utility computing and storage products (or at least they want you to think that’s what they’re really offering).
There’s also definitely been a PR push from Amazon about it’s Amazon Web Services product line. Really starting with a keynote at MIT Tech Review’s Emerging Technologies Conference (eweek coverage), the Web 2.0 thing and the pre-expo BusinessWeek article, then here, here, here and here and is looking to continue with Werner Vogel speaking at the Future of Web Apps conference, an active evangelism group and more and more press.
I like all of this because it validates our own business model: we have our own applications, we provide infrastructure to others, and those others tend to be like-minded developers. And as that all gels with time, effort and development what emerges is a platform.
But let me take the opportunity to discuss, clarify and challenge a few things.
What is a grid? What is “The Grid”?
(Note the a versus the and grid versus Grid.)
In 1998, Ian Foster and Carl Kesselman said:
“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.”
Then four years later in 2002, Ian Foster generalized the definition (See What is The Grid) and said that a “grid” would have to have meet three criteria:
- Coordinate between different organizations (implying that there would be economic, social and security models and policies),
- Adhere to open standards and protocols, and
- Deliver a “non-trivial QoS”.
This provided for less of a functional and compositional definition (a grid is servers that do xyz) and more of a social and business definition. Notice that this still defines a grid, meaning that as an organization, we can say we’re providing a grid service when these criteria are met. The Grid is a larger utopian umbrella of world-encompassing computational might!
I’d also suggest reading the ideas of a grid’s structure and function as laid out in The Physiology of the Grid.
The analogous concept of a grid and The Grid in our electrical utility industry (see a good paragraph by Nick Carr) is easy to see and conceptualize, and even it still faces issues in deregulation and commoditization (perhaps you remember Enron and $1000/month electricity bills for a one bedroom apartment in San Diego, California?).
But where one comes across “grids” in computer science and IT most often is the concept of a networking grid (and beyond that a networking “mesh”). And network peering between providers is of course a reality and common.
The network and the concept of peering is a good place to bring a couple of things together (and bring it back into the conversation when we talk about Amazon).
Because the “network” is still the real bottleneck both technically (latency and speed of ethernet) and economically in The Grid.
There is also currently no such thing as Grid Peering (the automatic failover and re-distribution of stuff from one provider to another with the goal of providing that “non-trivial QoS”) but that’s why there is an “open standards and protocols” criterion in Foster’s list.
Networks and interconnects aren’t free, they are laid down by private companies that seek to make money from them, and the speed, latency and capacity of the networks are important.
The fastest connections on the internet now are really OC-192 which in ethernetland, we’ll say is essentially 10 Gbps connections (GigaOm recently reported on Infinera’s 100 Gbps ethernet connection that could carry data over a 4000 kilometer fiber network). In datacenters, the common limit is 1 Gbps with 10 Gbps ethernet and slightly faster and lower latency Infiniband making some appearances. Within a computer though the interconnects are significantly faster and the components are very close, for example, a Sun X4100 has “three 8.0 GB/sec HyperTransport links with 6.0 GB/sec access between processor and memory”. 6 GBps is 48 Gbps (1 byte = 8 bits, and GB versus Gb is Byte versus bit).
The processor and memory talk to each other with a cap of 48 Gbps and because they’re in close proximity the latency is very low. The typical connectivity out of the back of a server would be 1 Gbps with standard copper (and up to 10-40 Gbps for ethernet up to infiniband), and filling up a 1Gbps uplink to the outside world (“The Internet”) would run you approximately $50,000/month for a single (decent Tier 1) provider.
So the limit to completely blurring out physical and geographical distinctions between computers is the network (still). It really serves then as the backplane for The Grid (from Dave Hitz of Netapp).
What if everything was normalized out? Could one have processors in one chassis and RAM in another? Could an application in one datacenter context switch to servers in another datacenter?
In the last year, Amazon has launched a series of interesting services: a simple messaging queue, S3 storage (utility storage) and EC2 (utility computation).
They’re relatively easy for an experienced programmer to use, and you get near instant access to them when you sign up. They’re not typical though, and there is only API access to storage.
The instant access and scalability must have some limits as well, and I don’t mean physical or computational limits, I mean that EC2 and S3 would be a great target for spammers and their various cousins to hit. Imagine being able to ramp up 5000 EC2 images with the same list of emails and setup, and start cranking out emails to billions of people. Massively in parallel, you could get a lot done in a short period of time. What stops this behavior? Where’s the audits? How real-time is the billing? What kinds of protections are in place for people who could use EC2 to launch a DDOS attack on another provider or company?
I’m sure that it’s covered in a acceptable use policy but a policy doesn’t proactively stop such behavior. And as someone who runs mail systems where 90% of the traffic is spam, I get concerned.
A decent rationale of Bezos for getting into this space is seen in some of the initial reports,
“‘The reason we’re doing this is because we think we can empower developers with a new kind of Web-scale technology. And we can make a profitable business for ourselves.’”
“‘The idea of using infrastructure Web services to remove costs for other businesses is something that’s already being accomplished by efforts like S3,’ said Bezos. ‘Our goal is to build services that are incredibly easy for developers to use and also very reliable, that can return results rapidly at a low cost, and allow users to pay by the drink.’” (from eweek)
As the weeks went by since the MIT Emerging Technology conference, there where challenges about how this is really a distraction for Amazon, who runs one of the largest online e-commerce sites in existence and is just now barely profitable. Amazon Web Services is a separate LLC though so it’s already off as another business. I don’t think that Amazon gave all of it’s infrastructure to AWS though, and then buys it back themselves as a utility.
So we saw the appearance of the rationale that Amazon is simply reselling non-utilized space, memory and CPU cycles, and it begins to show up in other’s writing with the rationale propagated by Nick Carr and Isabel Wang,
“But Bezos argues that Amazon is a natural in the emerging utility computing world. It has been honing its skills in large-scale technology operations for 11 years, and it has invested billions of dollars in its setup. Why not offer that infrastructure to others while it’s idle?” (Isable Wang)
I think the first half of that is fine but I don’t agree with the non-utilization argument.
There is no concept of “idle” in storage: while the disk might not be accessed at a given moment, the files are definitely taking up space.
This is an important distinction, because if Amazon needed space in a crunch, they couldn’t reclaim it. Not if other people are paying to be there.
If they needed computational power in a crunch, they could reclaim it. It would just mean turning everyone else off, assuming that they’re actually selling excess. The store’s infrastructure of course has limits, an XBOX promotion brought the site down for a bit during this last Thanksgiving holiday.
Even though the SmugMug story of saving lots of money has been making the rounds, I don’t buy it. Purely because we do our own storage infrastructure and I know that we can offer storage at the same price as Amazon S3 and still make money … so?
Instead the story of webmail.us is the most compelling and best use I’ve seen so far. By “best”, I mean most appropriate for the infrastructure, and applications. They intelligently use the combination of the message bus, compute and storage, to keep the correct things near each other. I think it’s also a great example of “The Grid” having the ability to get functionality off another provider, and to have that provider give you enough compute and storage to handle what should be best handled “locally”.
Because where is the main infrastructure for webmail.us itself? It’s at Rackspace.
So is webmail.us’s use of Amazon’s web services a success for Amazon or a failure of Rackspace? Or both?
Will Amazon’s product offering mature to the point where it would make sense to run everything there? Will Rackspace and similar managed hosting companies wise-up and begin to offer comparable services?
Will we begin to do Grid Peering relationships where say our users or Rackspace’s users could have network access from our servers to Amazon’s without incurring a bandwidth charge?
Who is hurt most by Amazon’s moves then? Nicolas Carr has this one correct, “Most of the big tech vendors—including IBM, HP and Sun—offer basic computing infrastructure as a pay-as-you go utility. But I think at the moment, they’re being outmaneuvered by Amazon Web Services,” (quote from news.com.com).
Amazon is really the biggest threat for large hardware manufacturers and could very well take away the long-tail of server customers and small businesses.
As a growing startup, remember you can’t “out google google” (as suggested in the businessweek article) by using someone else’s infrastructure. Doesn’t anyone realize that it’s not a coincidence that Google releases practically no information, data or software about their infrastructure, and the iterations that it’s gone through over the years?
The only argument is that you don’t need to worry about it until it’s a nice problem to have (how many googles or walmarts are there?).
Joyent happens to be in a similar space with our “Grid Accelerators”
Joyent is an infrastructure and development company that has put together a multi-site, multi-million dollar hosting setup for our own applications’ use and for the use of others. Our applications are predominantly in the Ruby on Rails framework, which we’ve been involved in since its inception via our TextDrive hosting product, and we also host a large number of sites and software written in Perl, Python, Java and even Erlang.
We’ve been selling the infrastructure pieces since the summer of 2006, and I think they have some nice Key Features that a lot of the competition does not.
Key features of what Joyent offers:
- AMD and T1 SPARC Sun Fires
- Sun Storage
- Solaris Nevada
- One and Ten gigabit ethernet networking throughout.
- Physically separate public, private and storage networking
- iSCSI and NAS
- Level3’s telco grade facilities
- High-end edge-of-network F5 load balancers
- On-demand RAM at $50/GB/month
- On-demand CPU at $200/CPU/month
- On-demand Storage at $0.50/GB/month or $1/GB/month
- On-demand Bandwidth at $0.20/GB
You can see the NIC cards separated from each other with a public (188.8.131.52), private and trunked 2 Gbps connection to storage.
[z09578AA:~] admin$ ifconfig -a
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 184.108.40.206 netmask ffffff80 broadcast 220.127.116.11
e1000g1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4 inet 10.71.165.93 netmask ffffff00 broadcast 10.71.165.255
aggr1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 172.16.165.93 netmask ffff0000 broadcast 172.16.255.255
We have recently wrapped up a few PDF documents for the new website and I thought this article would be a great time to go ahead and release them early.
There is a solid Datasheet, a more detailed Whitepaper and a couple of use cases: a Hosting cluster, a MySQL cluster setup and an example of deploying out a Ruby on Rails application.
Let me know if you have any questions, comments or concerns about our own things, and you can buy these pieces right now on the main textdrive site.
16 responses to “On Grids, the Ambitions of Amazon and Joyent”
Interesting analysis. What are the implications for those of us in the longest part of the long-tail (or, rather, the shallow end of the pool), those of use getting by on a shared account running small web apps?
The concept of ‘excess capacity’ is very common in the managerial accounting discipline for a production environment. I commend any tech company which is savvy enough to sell parts of their infrastructure.
Many credit Amazon’s supply chain enhancements as the true reason for the company’s continued existence. I thought it interesting that Amazon is offering their warehouse shelf space & distribution in addition to their web services.
Many reviewers focus on the storage and cpu offerings. It is a lot easier and faster to add disc or cpu capacity than it is to add additional warehouses. The concept is the same throughout the company however.
If there is empty space that the business itself is not using, then sell it to the market.
The company is attempting to build an infrastructure in both the tangible and online worlds. It is a brilliant move.
Great read Jason! I’m new to Joyent but I’m a solid believer you folks are leading the way to what online providers should strive towards in their business models. Cheers and congrats, I’m looking forward to reading more of the company’s plans, insights, and future.
A large portion of SmugMug’s concern when hosting their own static content was managing a large bit of hardware. With their infrastructure at the time it became expensive. Expanding their client base while reducing their server infrastructure probably looked fabulous to the CEO, as it would any number cruncher. It’s about time (read: money), not about any significant technological breakthrough.
“Even though the SmugMug story of saving lots of money has been making the rounds, I don’t buy it. Purely because we do our own storage infrastructure and I know that we can offer storage at the same price as Amazon S3 and still make money … so?”
What does this mean? Are you saying you don’t buy that Smugmug is saving money by using S3? I’m not really sure what the purpose of this post is. Are you asking your TextDrive customers (myself included) not to switch to Amazon’s services? I’m really confused after this post.
Hey this is a great write-up, but I think you guys should proof read those Grid PDFs 😉 There are a few run-on sentences and typos.
@Nate: the way I read it is that if Joyent can match S3’s pricing and still have a profit margin, it’s clearly possible for another company to roll its own storage infrastructure cheaper then buying S3.
Jason, the non-utilization argument does work as there is a strong seasonality to some businesses in which you can predict what exces capacity you will have when and where. This can work at an hourly/daily/monthly basis. e.g. Amazon needs to be scaled for its peak load during the year which is a relatively short period during certain days. You are right that for storage this is slightly different, although there are some seasonal needs, and you can easily imagine that there are many nodes completely under-utlizing the storage capacity they come with, so there is a some reuse you could do there. I am not saying that that is how S3 is designed, but there are ways in cutting and slicing through a large operation that would allow you to reclaim resources. I agree that the time scales are very different (EC2 hours versus S3 months) and at the larger timescales it because more complex to use them for adaptive management.
Great article Jason, I enjoy the leaking of little snippets of data about peoples infrastructure, but you just tell the whole story! Ace.
Why the name change from Container to Accelerator? Is it purely to distinguish your product from it’s implementation technology (Sun Solaris Containers)?
I liked the ‘Container’ name, as it gave a good mental image as to what you were getting. ‘Accelerator’ sounds a bit like those expensive appliances late 90s companies would have on their world-facing network connections.
Looks like Jonathan Schwartz gave a nice shout out to Joyent.
> So is Webmail.us’s use of Amazon’s web services> a success for Amazon or a failure of Rackspace?> Or both?
Here are my thoughts:
@Andrew:Isn’t the issue scale? Amazon and Joyent have cheaper costs per GB transfered & per GB stored. They pass this saving on to little folks like me or bigger folks like Smugmug, and make a profit. It’s just not such an interesting point to make for me, as a consumer. I know I can’t achieve those prices, so I want to know why this post seems to be about Amazon. What is Joyeur saying about their service compared with Amazon’s?
Hey everyone, it’s Thanksgiving Day! I’m enjoying my extra day off, and I am planning to doing something fun that will probably involve a moto trip and seeing something new in Lakeside I haven’t seen yet.
You write new post at Thanksgiving?
If you’re in a not good position and have no money to move out from that point, you will need to take the credit loans. Because it should aid you emphatically. I get short term loan every year and feel myself OK just because of it.
[…] a previous article, On Grids, the Ambitions of Amazon and Joyent, I made a few […]