Why EC2 isn’t yet a platform for “normal” web applications

In a previous article, On Grids, the Ambitions of Amazon and Joyent, I made a few premises:

  • The autoprovisioning and account management (and therefore the accessibility) is the improvement over a service like Sun’s Grid
  • There’s no way that Amazon.com is literally coming off of S3 and EC2 (one datacenter and no CDN abilities? I don’t think so).
  • That it’s a PR case study in flipping from real technical reasons (the world could use this) to bogus reasons (selling excess capacity, in which there is no such thing).
  • The webmail.us case study is a perfect use case and people should recognize it as a moment in our industry when someone started using a “true grid service” to augment their own servers (and did so with a credit card).
  • That evangelists beating the same drum again and again is a powerful thing

Let’s be clear, EC2 is fine when you’re doing batch, parallel things on data that’s sitting in S3. In line with the economics favoring compute being by data (Jim Gray’s Distributed Computing Economics), and is definitely an improvement over other publicly available batch systems. I can see why it would be attractive to those working on science grids, one just has to overcome the large data set in proximity to compute issue.

However, the promise of unlimited scalability (at least in the “scale up” direction) for normal web applications has no basis and is not technically possible beyond normal limits with EC2. A “normal” web application is that one that has to always be up and persistent.

And I get a bit irritated when I come across sentences like Jinesh’s at RailsConf: “infinity auto-scalable on-demand computing resource” (here)

I know that is has no basis because it lacks at least one critical thing: real application switches.

Yes, I’m now calling load balancers application switches because I think one has to distinguish software on a general purpose server from dedicated, high-end switching hardware. For example, OpenBSD is a great operating system and has OpenBGPD, and while I could slap it onto a couple of one unit servers to function as my routers, I wouldn’t do that above a certain level.

There is a difference in the horizontal scalability for how many rails processes you can hit in the backend (previous joyeur). The limit is typically <1000 req/second and not that many mongrels, so it’s pretty easy to know that any Rails application on EC2 is not pushing a lot of traffic.

While you might think I’m biased because we do have the Accelerator product line, let me make it clear that we have that product line because we’ve also had to scale some of the oldest Rails applications around, and that constantly feeds back into the design.

I’ve also said what I think is valid about EC2, but let’s be clear about the list of deficiencies for multi-tiered applications:

  • No IP address persistence (they all function as DHCP clients and are assigned an IP). One has to use dynamic DNS services for a given domain.
  • No block storage persistence. When the instance is gone, the data is gone. Yes I know you can send this back regularly to S3, but isn’t that actually a “hack”?
  • No opportunity for hardware-based load balancing (which happens to be the key to scaling a process based framework like Rails and mentioned above).
  • No vertical scaling (you get a 1.7Ghz CPU and 1 GB of RAM, that’s it). So like the block storage problem, this hits databases, we run about 32GB of ours in memory.
  • Can’t run your own kernel or make kernel modifications so there’s no ability for kernel and OS optimizations, and no guarantee that they’ve been done.
  • Images have to be uploaded and then moved around their network to find a launching point. This can take several minutes, if not more. Move 100 GBs around a busy gigabit network sometime and see.

(I’ll leave out ones like having to learn and program against proprietary API and commands like “ec2-run-instances ami-5da964c3 -k websvr-key”, that’s usually what’s called vendor lock-in).

In conclusion, EC2 is fine for batch on S3 data and for interacting with the Simple Queue Service (see the webmail.us example), but I wouldn’t put a multi-tiered web application on it.

18 Comments

  1. Jason, I meant to ask previously … from what I have read, it seems that lack of RAM and slow I/O are the major bottlenecks for any application – how does someone scale I/O and RAM?

    It seems that 32GB of RAM is all that someone can get into a box at a fair price and fast local disk is nice but doesn’t scale.

    My concern is that, especially for I/O – if someone starts using someone like a SAN (which is crazy expensive) – now you have to worry about network latency just to read data.

  2. Hey Jason,

    I agree — traditional web apps definitely don’t scale “out of the box” on Amazon services … but I think the reason has more to do with engineering expectations than it has to do with hard resource limitations.

    When most people design web applications, they don’t think about resource constraints, because they usually don’t have to. There really aren’t that many apps that need dedicated cache servers, database “shards”, or distributed storage. So, we typically don’t design our apps with those situations in mind, and we choose the frameworks and tools that let us get our apps out the door faster.

    I’m not saying that we should rebuild the wheel when building a blog or something — I just think that Amazon services require a different approach to building reliable, high traffic, multi-tiered web applications.

    Regardless — you’ve hit the nail on the head in describing Amazon’s shortcomings, and it’s certainly something to put some good thought into when embarking on a project.

  3. I would agree that conventional web applications that involve databases are not a good fit for sole use with EC2. However I disagree that EC2 can’t be used for “normal” web applications. The issue is that most people are accustomed to using sledge hammers (sun servers) and saw-zaws (load balancers). What needs to change is how applications are constructed.

    I think everything in your list of deficiencies can be resolved or for the most part ignored. Things like kernel tuning are just a crutch and you can tweek the OS all day long. The only thing EC2 is missing right now is a way to guarantee that instances are not allocated to the same rack/datacenter.

  4. @Greg

    I/O is a huge bottleneck as you mentioned, so as a result – people attempt to load as much as possible into main memory (RAM).

    Which brings up a very interesting point, one thing that VPS provide is plenty of computing power but hardly any RAM and/or slow disk.

    I’m curious how Joyent / Amazon / Web host company X all address this since most people don’t need a lot of CPU but need more RAM and faster I/O.

  5. I agree. EC2 is supposed to address the hassle of hardware, but instead it just creates new problems. I think they are on the right track though. If they can remove the BS with handling their instances and persistence they will have the next killer product at their disposal.

  6. This is a really good write up.

    Some of these things I really agree on. We moved our entire web site over to EC2/S3/SQS in March and though we have saved a lot of money it is way more of a hands on setup.

    DNS and database issues are my biggest concerns right now. We haven’t gone the way of dynamic DNS yet but we are also toying with the idea of putting a hardware load balancer in a co-lo and just paying for extra bandwidth. We would still save money.

    We actually had our database instance melt down, about a month ago, due to hardware failure on the server it was running on.

    The key to running on EC2, IMHO, is to be more proactive and have a lot more monitoring.

    We had lunch with Jeff Barr and he talked about how static IP’s are “in the pipe”. I enjoy using EC2 but I have lost a lot of faith in it being the end all, be all of hosting solutions.

    Feel free to email if you have any questions.

  7. I’m new to a lot of this, so I’m interested in hearing how a company does properly scale their application.

    I think Greg was asking something along the same lines as this as well.

    Thx

  8. 1. No IP address persistence (they all function as DHCP clients and are assigned an IP). One has to use dynamic DNS services for a given domain.

    this is how any reasonable data center would work with node clusters, a node would sign on to the master after it has its ip, then work would go to that ip.

    2. No block storage persistence. When the instance is gone, the data is gone. Yes I know you can send this back regularly to S3, but isn’t that actually a “hack”?

    if you are using ec2 like an image of a computer you have it all wrong, s3 is storage, ec2 is a computing cluster, nodes are smart, but with out storage, use s3 for that. running a db off of ec2 seems well, silly right now, unless you are using a google big table implementation right now

    3. No opportunity for hardware-based load balancing (which happens to be the key to scaling a process based framework like Rails and mentioned above).

    ec2 is a single processor thread running at 1.7ghz, and 1 gb of ram, use it like so and you will be happy.

    4. No vertical scaling (you get a 1.7Ghz CPU and 1 GB of RAM, that’s it). So like the block storage problem, this hits databases, we run about 32GB of ours in memory.

    think nodes, not SERVERS, you’re servers should be somewhere else.

    5. Can’t run your own kernel or make kernel modifications so there’s no ability for kernel and OS optimizations, and no guarantee that they’ve been done.

    can’t you load your own images? of ubuntu, redhat, suse, etc….

    6. Images have to be uploaded and then moved around their network to find a launching point. This can take several minutes, if not more. Move 100 GBs around a busy gigabit network sometime and see.

    a good node image in my opinion is about 1-2gb maybe 4gb if you have a really good reason. remember you are not running a db here or serving files from it, its a computational node only. in our case we are going to have ec2 manage security for s3 data, and maybe some spider’ing. that we will collect later from s3 and transfer to our computers.

  9. the ror AMI is a good thing, but the number of instances is hard coded…

    what would be very great to see is a modded pound load balancer (using http redirects) redirecting on the load criteria (not the url), integrated with lanching/stopping new instances of a dedicated web server appliance (that would join the server pool), all webservers sharing a distributed fs (openAFS?).

    I’m beginning to think of trying http://www.cloudstack.com on top of ec2… P2P is maybe the answear to EC2 (“simple” nodes – auto join – integrated load bal & replication…).

  10. I realise this post is a couple of months old. Amazon have just released two other instances with 4 core and 8 core configurations. Helps to alleviate one of the concerns raised.

  11. […] As for it being cheap, well definitely there are some strong opinions on this. One of them is found at HostingFu – Amazon Web Service is Expensive. Shane Conder’s “How Did I Miss Amazon EC2? is a good read about the comparison of EC2 to other types of hosting. Finally Jason Hoffman of Joyent has an excellent opinion called Why EC2 isn’t yet a platform for “normal” web applications? […]

Comments are closed.