Solaris, DTrace and Rails

We committed ourselves to Solaris as our base operating system two years ago as Solaris was becoming OpenSolaris. We needed a solid operating system that was 32/64bit, can manage lots of CPUs and RAM, one that we could contribute to, and we realized that three features would be a competitive advantage if we became experts in using them in production: ZFS, Zones and DTrace (the pre-existing observability tools in Solaris are quite excellent).

With time, it’s been surprising to me how that’s not clear to a lot more people, and we at Joyent have found ourselves spending a considerable amount of time simply being Solaris advocates.

So this last Sunday we (Ben and I) had a DTrace jam session at the Obvious offices with Bryan Cantrill and Jeremy over at Twitter, and were just running through an expanded way of looking at what ruby processes are doing when in production (the luxury of the processes being an issue is what great load balancers gives you).

(Jeremy, Bryan, Ian; the MacBook Pro in front of Bryan happens to be Ben’s, Bryan only uses Solaris on an Acer Ferrari 😉 )

Bryan happens to be a great Solaris advocate, and his biggest hammer is DTrace. So we try and get him in front of as many customers as we can, so that he can make an experiential case for why everyone with applications in production should be using Solaris. James Governor seems to agree.

We use DTrace all the time in identifying performance issues in our customer’s and in our own applications (remember we have some of the largest and oldest rails apps around), but it helps when its creator makes an appearance and a case for it as well. It was a bonus that Ian Murdock was there as well. Ben covered his impressions (or as he said, his “verdict”) of Ian after being able to spend a Sunday afternoon with him talking about Solaris.

From just a little bit of time, and out of the box it was clear that a lot of CPU was spent in an odd place: raised exceptions would generate backtraces that were going through hundreds of frames.

The result was a ticket filed at 16 hours ago, and David committed the changes to Rails itself 5 hours ago. Everyone benefits from something pointed to by DTrace on Solaris.

And I think this is much in line with David’s main points about communities around open source projects, and a nice example of an open source operating system and tool giving you specific insights into a open source language and web framework, and then submitting and applying fixes to the right places. Blaine, the guys at Twitter, and DHH should be proud of this example.

But this introspection into what the Ruby processes are doing is still not deep enough. DTrace tells us from the outside of an interpreted language what it’s doing in regard to tens of thousands of different “probes” throughout the operating system.

You can see the number of places DTrace is hooked in with

$ dtrace -l | wc -l
$ dtrace -n fbt:::entry'{@[execname] = count()}'
dtrace: description 'fbt:::entry' matched 24798 probes

That’s more probes then I often know what to do with, but it’s great that they’re there.

Nearly a year ago, we worked with Bryan and the DTrace team to get as complete a set as possible of is-enabled probes into Ruby and to maintain that going forward (there’s only been patches against 1.8.2 up to this point). We’ve done that internally, and now with the increased use of Solaris by our own customers, and with DTrace showing up in the what is likely the most common development platform for Rails, Mac OS X, we renewed our collaboration today.

So we’re polishing up the patches for Ruby 1.8.5 (the version we and our Accelerator customers still run in production, and we’ll do 1.8.6 as well once that’s done in QA), and we’ll be sending those off to Bryan and his crew by Thursday for some vesting, and we’re aiming to release those next week.

The result is going to be a tremendous amount of insight into production Rails (and other Ruby) processes, and we hope that lots of improvements (even application-specific) can come from that.

4 responses to “Solaris, DTrace and Rails”

  1. Damn. We need somebody to do this for Python/mod_python.

  2. So as a result, how much quicker is Twitter now with the applied exception change?

  3. Ping from the Unofficial Twitter Community and Forums

  4. Thanks, Jason. Great write-up.

%d bloggers like this: