Perpendiculous Programming, Personal Finance, and Personal musings

2009.09.05

alloc, allocWithZone showdown

Filed under: Uncategorized — cwright @ 8:14 am

This brief post will explore some more mundane (but still measurable!) aspects of optimizing Objective-C software.  This time around, we’ll talk about object allocation.When creating objects in Objective-C, there are approximately 5 ways about it.  Each one has a slightly different execution time cost, executable size cost, and line-of-code-to-write cost.  For this, I’m mostly going to explore time and lines-of-code costs.

These methods look like the following (one method per line):

id anObject = [NSObject new]; // 3 total messages
id anObject = [[NSObject alloc] init]; // 3 total messages
id anObject = [[NSObject allocWithZone:NULL] init]; // 2 total messages
id anObject = [anotherObject copy]; // 3 total messages
id anObject = [anotherObject copyWithZone:NULL]; // 2 total messages

(There’s also a lower-level objc-runtime-only set of function calls to accomplish the above, but that’s more verbose than most people are willing to go — if you have to drop to that level for acceptable performance, your object model is fundamentally wrong, and should be revised.)

These all accomplish the same thing, with varying levels of performance.  -copy and -copyWithZone won’t actually work in the above because NSObject doesn’t implement copy.  But the top three can illustrate some interesting characteristics.

Here’s our breakdown (I did a few runs and averaged them — I ran the microbench on a 1.83GHz Core2 Duo MacBook running Mac OS X 10.5.8, as well as on a 2.4GHz Core2 Duo MacBook Pro running Mac OS X 10.6.0):

10.5.8/1.83GHz (32 bit):
new time: 8.413659 (420.682949ns/iteration)
alloc/init time: 8.282779 (414.138952ns/iteration)
allocWithZone/init time: 8.030380 (401.519001ns/iteration)

10.6.0/2.4GHz (32 bit):

new time: 6.439551 (321.977550ns/iteration)
alloc/init time: 6.351412 (317.570600ns/iteration)
allocWithZone/init time: 5.972713 (298.635650ns/iteration)

In 64bit mode, we have much better performance all around:

10.5.8/1.83GHz (64bit):

new time: 5.590315 (279.515749ns/iteration)
alloc/init time: 5.339382 (266.969100ns/iteration)
allocWithZone/init time: 5.173148 (258.657402ns/iteration)

10.6.0/2.4GHz (64bit):

new time: 4.058811 (202.940547ns/iteration)
alloc/init time: 3.873028 (193.651399ns/iteration)
allocWithZone/init time: 3.846562 (192.328098ns/iteration)

(yes, that’s right — object allocation in 64bits on Leopard at 1.83GHz is faster than 32bit allocations on Snow Leopard at 2.4GHz — that’s actually quite impressive evidence for switching to 64bit sooner, rather than later 😉

Since we have different underlying hardware and operating systems, we aren’t going to generalize between the two sets, except to say that 2.4GHz/Snow Leopard is faster than 1.83GHz Leopard (a big surprise — 2.4 is simply faster all around, even without Snow Leopard).

In both 32 bit cases, doing the alloc/init dance gives a tiny ~1.5% performance improvement.  More interestingly, allocWithZone/init give a 4.5%-7% performance improvement.  In cases where lots of objects are getting allocated frequently, switching to allocWithZone instead of new can provide some slight but measurable benefits.   (Note that we didn’t profile other object classes in this test — we just wanted to test the raw allocation overhead.  In real life, this overhead is still present, but generally lower-weight.  Our optimization above will still reduce that overhead.)

64 Bit isn’t quite as dramatic, but the effect is still present — -WithZone is still faster by a tiny margin.

-copy and -copyWithZone have a relationship similar to alloc/allocWithZone, in that copy simply calls copyWithZone:NULL (alloc simply calls allocWithZone:NULL) – the benefit of using the -WithZone: variant up front is that you’re skipping the extra message-send (saving a few cache lines, some stack, and some potential objective-c method lookups).

It may seem like 10-20ns isn’t a very long time (and indeed, 20 billionths of a second is pretty darn short in human terms), but in heavy allocation contexts this can add up very quickly.  Stacking up with thousands of allocations, it’s possible to shave off larger portions of time employing this.  For no additional lines of code, a tiny overhead reduction can be a pleasant pick-me-up for almost no additional effort.

Note that with all optimization, it should only take place after working code has been written and profiled to locate hotspots, and after optimal (or at least somewhat tuned) algorithms have been employed (optimizing one-time init code generally doesn’t help anything, optimizing cold code doesn’t help much, and using fast slow algorithms is worse than using slow fast algorithms almost all of the time.)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress