Yahoo Post: “Multi-Core HTTP Server with NodeJS”

The Yahoo! Developer Blog has a nice post about node.js on how they’re running node.js

A good comment on news ycombinator:

Node.js lets you write server applications in a server container that can handle tens of thousands of concurrent connections in a loosely typed language like Javascript which lets you code faster. It uses the same design as Nginx which is why it can handle so many connections without a huge amount of memory or CPU usage.

If you were to do this on Nginx you’d have to write the module in C.

You can’t do it on Apache because of Apache’s multi-process/thread model.

The fact that you can write a web server in a few lines of easy to understand and maintain Javascript that can handle over 10,000 concurrent connections without breaking a sweat is a breakthrough.

Node.js may do for server applications what Perl did for the Web in the 90’s.

EMC bought Greenplum

EMC said today that it will acquire private data warehousing company Greenplum in an all-cash transaction, though the terms of the deal were not released. It said that Greenplum will “form the foundation of a new data computing product division within EMC’s Information Infrastructure business.”

It’s no secret that digital data is on the rise, both on business and consumer levels. EMC called Greenplum a visionary leader that utilizes a built-from-the-ground architecture for analytical processing. In a statement, Pat Gelsinger, President and Chief Operating Officer of EMC’s Information Infrastructure Products, said:

The data warehousing world is about to change. Greenplum’s massively-parallel, scale-out architecture, along with its self-service consumption model, has enabled it to separate itself from the incumbent players and emerge as the leader in this industry shift toward ‘big data’ analytics. Greenplum’s market-leading technology combined with EMC’s virtualized Private Cloud infrastructure provides customers, today, with a best-of-breed solution for tomorrow’s ‘big-data’ challenges.

The company said it expects the deal to be completed in the third quarter, following regulatory approval. It is not expected to have a material impact on EMC’s fiscal 2010 GAAP and non-GAAP earnings.

From this ZDNET article.

I actually think that in 7-10 years, this acquisition by EMC could be as important as their VMWare acquisition. Remember the past was “cloud networking, the presence is “cloud computing” and the future is “cloud data”. Virtualization is not the end-all-be-all of “cloud computing” but it is a component. Think of these types of data stores as an important component in the future of distributed, pervasive data.

Triple Parity Raid

In an effort to catch up on links.

Adam talks about triple parity RAID (raidz3) in an ACM queue article.

When RAID systems were developed in the 1980s and 1990s, reconstruction times were measured in minutes. The trend for the past 10 years is quite clear regardless of the drive speed or its market segment: the time to perform a RAID reconstruction is increasing exponentially as capacity far outstrips throughput. At the extreme, rebuilding a fully populated 2-TB 7200-RPM SATA disk—today’s capacity champ—after a failure would take four hours operating at the theoretical optimal throughput. It is rare to achieve those data rates in practice; in the context of a heavily used system the full bandwidth can’t be dedicated exclusively to RAID repair without adversely affecting performance.

Fifteen years ago, RAID-5 reached a threshold at which it no longer provided adequate protection. The answer then was RAID-6. Today RAID-6 is quickly approaching that same threshold. In about 10 years, RAID-6 will provide only the level of protection that we get from RAID-5 today. It is again time to create a new RAID level to accommodate the realities of disk reliability, capacity, and throughput merely to maintain that same level of data protection.

You’re Doing it Wrong by PHK

PHK’s You’re Doing It Wrong

Think you’ve mastered the art of server performance? Think again.
Would you believe me if I claimed that an algorithm that has been on the books as “optimal” for 46 years, which has been analyzed in excruciating detail by geniuses like Knuth and taught in all computer science courses in the world, can be optimized to run 10 times faster?

Later in the article, PHK has a key example with Varnish’s memory management. In short, it doesn’t. It takes advantage of the fact that it’s on a great kernel that already does this for it. Far too often, we’re making userland software that behaves as if there is nothing underneath and instead strives to do everything. The result? Massive inefficiencies and performance far less than you should be getting.