<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Perpendiculous</title>
	<atom:link href="http://perpendiculo.us/?feed=rss2&#038;p=104" rel="self" type="application/rss+xml" />
	<link>http://perpendiculo.us</link>
	<description>Programming, Personal Finance, and Personal musings</description>
	<lastBuildDate>Sun, 20 May 2012 05:50:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>hand holding</title>
		<link>http://perpendiculo.us/?p=266</link>
		<comments>http://perpendiculo.us/?p=266#comments</comments>
		<pubDate>Sun, 20 May 2012 05:50:48 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=266</guid>
		<description><![CDATA[Lately, Alias has been teaching our daughter how to hold our hand when we walk places. This is a relatively new experience for her, since she&#8217;s accustomed to us carrying her, putting her in a stroller, or leaving her to her own self-directed devices. Nevertheless, she&#8217;s picked the skill up well, and basically performs admirably. [...]]]></description>
				<content:encoded><![CDATA[<p>Lately, Alias has been teaching our daughter how to hold our hand when we walk places.  This is a relatively new experience for her, since she&#8217;s accustomed to us carrying her, putting her in a stroller, or leaving her to her own self-directed devices.  Nevertheless, she&#8217;s picked the skill up well, and basically performs admirably.<span id="more-266"></span><br />
To put the skills to the test, this morning I decided to take her to the park about half a mile away.  She did great, but she was pretty tired by the time we got there (this was to be expected, and was also part of the plan &#8211; the hope was that if we sufficiently exhausted her, she&#8217;d sleep better throughout the night, thereby letting us sleep more soundly as well).<br />
Along the way she briefly took breaks to look at rocks or flowers, but quickly resumed holding my hand when the aforementioned inspection was completed.  The one exception to this was when a small, curious letter &#8216;d&#8217; (or was it &#8216;p&#8217;, and we were looking at it upside down?) was marked on a passing driveway.  Learning letter has been a recent interest of hers, so she&#8217;s always on the lookout for them.  Prior to departing on this sojourn she was happily poking at our license plate for similar reasons.<br />
Perhaps more interestingly, the resident at the alphabetted driveway house was home (a 20-30 something? woman), along with a 4-6 year old girl (presumably her daughter).  Her daughter quickly took an interest in my daughter, but mine was stricken with shyness, such that she didn&#8217;t really speak (her normal mode of operation is to make noise a lot, babbling about whatever she&#8217;s thinking I suppose).  However she did approach them (the mother-esq person was holding a magazine of some sort, and those of full of letters to look at!) without much hesitation.<br />
I think this (the walks to the park, at least) will become a routine.  Hopefully it&#8217;ll become our bonding time as she grows and learns to both explore the world and express herself more.  Right now, at 17 months, she&#8217;s not much of a conversationalist, but even so it reinvigorates my imagination, looking at the world as though it was through new, discovering eyes.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=266</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>page tables and you</title>
		<link>http://perpendiculo.us/?p=249</link>
		<comments>http://perpendiculo.us/?p=249#comments</comments>
		<pubDate>Thu, 22 Dec 2011 07:44:30 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=249</guid>
		<description><![CDATA[Any comp-sci grad worth his or her student loan debt can tell you about virtual memory. And many can tell you the intricacies of dealing with unix-style VM. But for the developers working with Mac OS X (and iOS), there&#8217;s a deeper layer hidden underneath that is seldom expressed (and often for good reason) &#8211; [...]]]></description>
				<content:encoded><![CDATA[<p style="font-size: 19.5px; line-height: 28.5px;">Any comp-sci grad worth his or her student loan debt can tell you about virtual memory. And many can tell you the intricacies of dealing with unix-style VM. But for the developers working with Mac OS X (and iOS), there&#8217;s a deeper layer hidden underneath that is seldom expressed (and often for good reason) &#8211; Mach. <span id="more-249"></span>Mach, not Unix, is the true undercarriage of OS X and its ilk.  And since it&#8217;s basically the only wide-spread Mach-based OS on the planet, there&#8217;s a relatively large void when it comes to dealing with this layer.  There aren&#8217;t many reasons to drop down to this layer, but sometimes, for fun or for profit, the need may arise.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">For this first post, let&#8217;s explore allocating, deallocating, and protecting memory pages.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">The functions we&#8217;re interested in here are vm_allocate, vm_deallocate, and vm_protect.  You can find them in /usr/include/mach/vm_map.h.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">vm_allocate, unsurprisingly, allocates memory pages into a task (process).  It takes a target task (usually mach_task_self()), an address pointer, a size, and some flags.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">The address pointer is often best left unspecified.  This allows the kernel to map in pages wherever there&#8217;s room.  on 64 bit systems this isn&#8217;t particularly interesting, but for 32 bit processes address space can be fairly constrained.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">Size is often a multiple of a page size (4k, or 4096 bytes).  It defines how large of an area you want to map.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">Flags is where things get interesting.  There are several to choose from, tersely documented in vm_statistics.h.  A couple notes on them:</p>
<p style="font-size: 19.5px; line-height: 28.5px;">VM_FLAGS_FIXED &#8211; This controls whether or not it should try to map the pages at the address you specified. It&#8217;s actually implied by the absense of VM_FLAGS_ANYWHERE.<br style="font-size: 19.5px; line-height: 28.5px;" /> VM_FLAGS_ANYWHERE &#8211; This indicates that it&#8217;s ok to ignore the input address, and simply allocate the pages wherever they&#8217;d fit (and return the base address in address).<br style="font-size: 19.5px; line-height: 28.5px;" /> VM_FLAGS_PURGABLE &#8211; This indicates that it&#8217;s ok for the pages to get purged (that is, discarded without first being written to disk).  This can be useful for caches and other data that can be recomputed.  I&#8217;ll discuss purging pages in a future update.<br style="font-size: 19.5px; line-height: 28.5px;" /> VM_FLAGS_NO_CACHE &#8211; This controls how the pager deals with prioritizing pages in low-memory situations.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">vm_deallocate is used to undo an allocation.  Or in other words, free the memory vm_allocate makes use of.  It takes a task (again, often mach_task_self()), a base pointer, and a size.  It behaves largely how you might expect.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">vm_protect is interesting (and potentially useful).  With it you can mark pages are readable (VM_PROT_READ), writable (VM_PROT_WRITE), and executable (VM_PROT_EXECUTE).  These are defined in /usr/include/mach/vm_prot.h.  Newly created pages (via vm_allocate) are readable and writable, but not executable.  There&#8217;s also a copy protection used for copy-on-write behavior, but that&#8217;s also for a future post.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">Now for some sample code!</p>
<p>&nbsp;</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;mach/mach_interface.h&gt;</span>
<span style="color: #339933;">#include &lt;stdlib.h&gt;</span>
<span style="color: #339933;">#include &lt;stdio.h&gt;</span>
<span style="color: #339933;">#include &lt;stdbool.h&gt;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	<span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>base<span style="color: #339933;">;</span>
	kern_return_t ret<span style="color: #339933;">;</span>
&nbsp;
	ret <span style="color: #339933;">=</span> vm_allocate<span style="color: #009900;">&#40;</span>mach_task_self<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #009900;">&#40;</span>vm_address_t<span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;</span>base<span style="color: #339933;">,</span> <span style="color: #0000dd;">1024</span><span style="color: #339933;">*</span><span style="color: #0000dd;">1024</span><span style="color: #339933;">,</span>
					  VM_FLAGS_ANYWHERE<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>ret <span style="color: #339933;">==</span> KERN_SUCCESS<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;got allocation: %p<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> base<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		base<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #208080;">0x42</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span>
		<span style="color: #b1b100;">return</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p style="font-size: 19.5px; line-height: 28.5px;">this will allocate 1MB (256 pages) anywhere they&#8217;ll fit in the process. They&#8217;ll be read/write by default.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">	ret <span style="color: #339933;">=</span> vm_protect<span style="color: #009900;">&#40;</span>mach_task_self<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #009900;">&#40;</span>vm_address_t<span style="color: #009900;">&#41;</span>base<span style="color: #339933;">,</span> <span style="color: #0000dd;">4096</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> VM_PROT_NONE<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>ret <span style="color: #339933;">==</span> KERN_SUCCESS<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;access is now None!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;setting permission to none failed!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p style="font-size: 19.5px; line-height: 28.5px;">This will set the pages to no access &#8211; no reads, no writes, no execution. This is pretty worthless, but it demonstrates something coming up</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">ret <span style="color: #339933;">=</span> vm_protect<span style="color: #009900;">&#40;</span>mach_task_self<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #009900;">&#40;</span>vm_address_t<span style="color: #009900;">&#41;</span>base<span style="color: #339933;">,</span> <span style="color: #0000dd;">4096</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> VM_PROT_READ<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>ret <span style="color: #339933;">==</span> KERN_SUCCESS<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;access is now Read Only!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;resetting protections failed<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p style="font-size: 19.5px; line-height: 28.5px;">This would restore read permission to the pages, but the above call set the maximum protections to none. That will make this call fail (the pages will remain unreadable).</p>
<p style="font-size: 19.5px; line-height: 28.5px;">and finally, cleaning up</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">	ret <span style="color: #339933;">=</span> vm_deallocate<span style="color: #009900;">&#40;</span>mach_task_self<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #009900;">&#40;</span>vm_address_t<span style="color: #009900;">&#41;</span>base<span style="color: #339933;">,</span> <span style="color: #0000dd;">1024</span><span style="color: #339933;">*</span><span style="color: #0000dd;">1024</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>ret <span style="color: #339933;">==</span> KERN_SUCCESS<span style="color: #009900;">&#41;</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;deallocated!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">else</span>
		<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;error deallocating!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p style="font-size: 19.5px; line-height: 28.5px;">which will deallocate the pages.</p>
<p style="font-size: 19.5px; line-height: 28.5px;">And there you have it. Basic kernel-level memory allocation for your process. Now you can get page-aligned allocations of large slabs of memory (sometimes useful for storing GL textures with GL_APPLE_client_storage, among other things). Note that vm_allocate is _not_ faster than malloc/free. It is not wise to use these functions as a replacement for general allocation functions. Standard debugging tools will also not deal with these allocations (guard malloc won&#8217;t be able to help you, leaks won&#8217;t be able to help you, Instruments probably won&#8217;t be able to help you &#8211; vmmap will show them though, and you can tag them so that they stand out, using VM_MAKE_TAG as one of the vm_allocate flags).</p>
<p style="font-size: 19.5px; line-height: 28.5px;">It can also be useful for initializing some large set of data, and then marking it as read-only to ensure you don&#8217;t accidentally mutate it.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=249</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MMX (and not the ISA)</title>
		<link>http://perpendiculo.us/?p=227</link>
		<comments>http://perpendiculo.us/?p=227#comments</comments>
		<pubDate>Fri, 07 Jan 2011 07:10:44 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=227</guid>
		<description><![CDATA[MMX was an instruction set added by Intel to its processor lineup in the mid 1990&#8242;s.  It was marketed as a parallelization technology (it wasn&#8217;t) that would accelerate your games (it didn&#8217;t).  Offering an integer-only instruction set to a world that had largely moved on to floating point wasn&#8217;t particularly interesting or useful (though it [...]]]></description>
				<content:encoded><![CDATA[<p>MMX was an instruction set added by Intel to its processor lineup in the mid 1990&#8242;s.  It was marketed as a parallelization technology (it wasn&#8217;t) that would accelerate your games (it didn&#8217;t).  Offering an integer-only instruction set to a world that had largely moved on to floating point wasn&#8217;t particularly interesting or useful (though it did have a few fun applications), but that didn&#8217;t stop me from writing about it a long long time ago <a href="http://softpixel.com/~cwright/programming/simd/mmx.php" target="_blank">here</a>.  I won&#8217;t be elaborating further on that.  Instead, I&#8217;d much rather go over the <em>other</em> MMX, also known as 2010.</p>
<p><span id="more-227"></span>2010 shaped up to be the most tumultuous years of my life. Quite unexpectedly too.  A few things were in-flight from 2009 that we were expecting to watch unfold, but we had no way to predict just how volatile it would ultimately be.</p>
<p>First, there was the relocation.  That ball was set in motion back in 2009, but we weren&#8217;t entirely sure how it would play out (at one point, there were 3 destinations on the table, including Munich, Boston, and Cupertino;  the last ultimately won out after several bizarre twists and turns).  There are only a couple times in my life where I&#8217;ve actually relocated.  In fact, pretty much just two:  going out west, and getting married.  In both of those cases, while the venue was different, as were the people, I had a pretty good idea of what was going on.  This time, not so much.  Moving with a family (of 2, thankfully) was a much larger logistical nightmare than I had anticipated, and thankfully I was able to defer that to Alisa, who graciously liquidated 90% of our belongings, and coordinated shipping of the rest.  Further chronicles of this operation can be had elsewhere in previous posts.</p>
<p>Second was a new car.  Never in my life would I have predicted that.  Not because it was impossible, but because it&#8217;s financially untenable.  Never the less, it worked out rather well, and long-term we haven&#8217;t suffered severe financial hits.  It&#8217;ll always be a loss, but such things are &#8212; cars fundamentally aren&#8217;t investments (unless you keep them 50 years).  The total maintenance costs have been excitingly negligible ($57 for the entire year, for an overpriced oil change that looks to be due about every 9 months at our driving rate), which offsets the monthly payments (which aren&#8217;t so negligible).</p>
<p>Shortly after the car came news that Alisa was expecting.  And shortly after that, she started dying from hyperemesis.  A trip to LA and a week-long hospital stay got her all patched up, and we hit  our insurance deductible in about 6 hours.  We&#8217;ll be making payments on that for months to come.</p>
<p>A free ticket to a WWDC I wouldn&#8217;t participate in was a bit unexpected;  I was glad to attend, but felt a bit useless being a desktop developer amid an army of iOS folks.  It will be interesting to watch the balance of powers shift (as they already have) over the next few years as things continue moving.  I had a hand in making a <a href="http://touchreviews.net/apple-app-store-hyperwall-wwdc/" target="_blank">pretty cool wall</a> though, which was an interesting experience to say the least.</p>
<p>Afterward, things calmed down for a bit.  Then they didn&#8217;t.</p>
<p>I had performed at least as well as expected, and I got moved to full time before my contract was up.  That provided all kinds of perks (insurance!) and bonuses (stock!), but also meant more responsibility (accountability!).  Handling <a href="http://store.apple.com/us/browse/home/shop_mac/family/macbook_air?mco=MTM3NjY1OTU">prototypes</a> was interesting at first, but the novelty has worn off.  I remember carefully <a href="http://fdiv.net/2007/04/05/malus-sylvestris-migration-part-1" target="_blank">unboxing my MacBook back in 2007</a>, keeping all the pieces nice and dainty.  I&#8217;ve now unboxed untold number of systems, and find the packaging to be wearying.  &#8220;You mean I have to slide open <em>another</em> full-sized keyboard box again?&#8221; *sigh*.  Yes, everyone, I have truly become George Jetson.</p>
<p>Shortly after that there was a reorganization internally.  I&#8217;m still not sure what I can say about it, but it suffices me to say that things change in a quite unexpected manner (only one guy changed buildings though, and no one was fired, so perhaps it wasn&#8217;t that radical;  it sure felt like it at the time though).</p>
<p>A few more weeks of silence, and then <em>somebody</em> has the gall to show <em>something</em> to Steve Jobs.  I don&#8217;t know who it was, but I know what they showed.  Luckily, Steve liked it.  Unluckily, that meant our little team had to <em>build</em> it.  For a demo.  Aired live.  Nationally.  In 4 weeks.  We scrambled, scratched our heads, argued, pointed fingers, argued some more, called an intern back in for a week, and got carte blanche to tell other teams they weren&#8217;t working on <em>Steve&#8217;s</em> demo and thus re-prioritize their work for them.  To be fair, several other teams did a huge amount of work too, and in the end it went <a href="http://events.apple.com.edgesuite.net/1010qwoeiuryfg/event/index.html" target="_blank">quite well</a>.</p>
<p>The remainder was a blur.  Holidays such as Thanksgiving, Christmas, and New Years littered holiday time off in between periods of work.  Somewhere in there there was a holiday party at the Half Moon Bay Ritz Carlton;  Alisa and I had a lot of fun, and look forward to more cool parties like that (hopefully with better appetizers though).  Then back to work one last time before the last week before everything shut down.  18 hours into vacation, Alisa went into labor, and 18 hours after that Penelope was born on December 25th.  Alisa&#8217;s parents were on hand to cook up a storm and otherwise hold down the fort as we ferried to and from the hospital for a few days until they were able to come home in good spirits.  Those quickly faded, and Penny&#8217;s insatiable appetite prevented us from getting more than a couple hours of sleep at a time for the first several nights.</p>
<p>After all was said and done, by the time December 31st rolled around I was tired.  I rang in the new year in an unconscious heap upon my bed, only to be woken shortly thereafter by the dinner bell that tolls not for me.</p>
<p>MMXI is set to look even crazier&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=227</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Ghost in the shell</title>
		<link>http://perpendiculo.us/?p=218</link>
		<comments>http://perpendiculo.us/?p=218#comments</comments>
		<pubDate>Thu, 12 Aug 2010 10:47:37 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=218</guid>
		<description><![CDATA[I was planning on some really cool technical junk for tonight&#8217;s post, but midway through the day I caught word of a friend&#8217;s death.Part of maintaining a non-anonymous public online presence means continually maintaining (trying to maintain?) implied privacy &#8212; people don&#8217;t like to see critiques of decisions they make, even when the situation is [...]]]></description>
				<content:encoded><![CDATA[<p>I was planning on some really cool technical junk for tonight&#8217;s post, but midway through the day I caught word of a friend&#8217;s death.<span id="more-218"></span>Part of maintaining a non-anonymous public online presence means continually maintaining (trying to maintain?) implied privacy &#8212; people don&#8217;t like to see critiques of decisions they make, even when the situation is sanitized as much as possible so as to be anonymous;  I don&#8217;t like throwing people under the bus, but I do get a kick out of scrutinizing bad choices so I can try to make better ones (I routinely burn down my own ideas and decisions;  I&#8217;m impartial as to where they come from).  Because of this strict anonymity, many written things here become self-centric (because I can deal with me complaining) or so ambiguous that they&#8217;ve lose all meaning.</p>
<p>Another difficulty is that of honoring.  Not for the person being honored, but for those that person may have wronged, or for those that feel they&#8217;ve been wronged.  It&#8217;s easy to deify someone you&#8217;ve not known your whole life, and it&#8217;s easy to censor memories when reality makes heroes less than heroic at all times.</p>
<p>Today I&#8217;m going to break that second rule and pay my respects to Dayle Jellings.  While he was almost certainly not perfect, I am ignorant of his misdeeds and have no desire to become privy to them.  I know that he rendered me service and friendship unlike that of any other, all while never asking for anything in return.  Transportation, ice cream, laundry money, hair cuts, fellowship, those are the things I remember him for.  Even after I moved out he still kept in touch, traveling far out of his way just to say hello.  After we left he remembered each of us years later.  When one of us made a poor decision, he didn&#8217;t abandon or condemn.</p>
<p>A Lauri Anderson lyric describes losing a father as a whole library burning down.  For me, Jellings was more than a library;  he was a friend.  He will be missed.</p>
<p>His facebook account has him immortalized (at least updated with a eulogy of sorts), complete with a friend wishing him a speedy recovery.  Such hollow wishes always choke me up.  How powerless we really are, how fragile life is.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=218</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cache is King -or- Things are about to get MESI</title>
		<link>http://perpendiculo.us/?p=213</link>
		<comments>http://perpendiculo.us/?p=213#comments</comments>
		<pubDate>Sun, 01 Aug 2010 06:24:51 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=213</guid>
		<description><![CDATA[A few days ago I was chatting with some friends, and the topic of caching came up. I mentioned MESI, which is the basis for modern multicore cache coherence (There are variants like MOESI and MERSI, but the general idea is the same).  It then occurred to me that I&#8217;ve never actually made a test [...]]]></description>
				<content:encoded><![CDATA[<p>A few days ago I was chatting with some friends, and the topic of caching came up.  I mentioned <a href="http://en.wikipedia.org/wiki/MESI_protocol">MESI</a>, which is the basis for modern multicore cache coherence (There are variants like MOESI and MERSI, but the general idea is the same).  It then occurred to me that I&#8217;ve never actually made a test to see the effects of MESI in action.<span id="more-213"></span>To begin, ere&#8217;s an easy-as-pie pathological performance demonstration:</p>
<pre>cwright@phendrana:~/projects/test/cache&gt;gcc -framework Foundation source.m -o source -Os</pre>
<pre>cwright@phendrana:~/projects/test/cache&gt;./source</pre>
<pre>  &amp;value1: 0x2014</pre>
<pre>  &amp;value2: 0x2018</pre>
<pre>  same cache line!</pre>
<pre>  elapsed: 12639.155030ms</pre>
<pre>cwright@phendrana:~/projects/test/cache&gt;gcc -framework Foundation source.m -o source</pre>
<pre>cwright@phendrana:~/projects/test/cache&gt;./source</pre>
<pre>  &amp;value1: 0x2020</pre>
<pre>  &amp;value2: 0x2080</pre>
<pre>  different cache line!</pre>
<pre>  elapsed: 5614.167988ms</pre>
<p>That above nonsense is me compiling the same piece of code (analyzed later) being compiled with optimization (and taking ~12.6 seconds to complete), and without optimization (taking ~5.6 seconds to complete).  Wait a minute.  Optimized takes over 2x as long?  If that&#8217;s not a teaser, I don&#8217;t know what is.</p>
<p>Modern CPUs have a few levels of cache between their registers (the fastest memory around) and the Harddrive (the slowest, unless you count the network).  You&#8217;ve probably heard of one of the most important caches, RAM &#8212; it&#8217;s filled with stuff from the harddrive (and temporary working data) because accessing the harddrive is so laughably slow that if you understood it you&#8217;d never refer to your computer making that grinding noise as &#8220;thinking&#8221; &#8212; it&#8217;s actually doing quite the opposite:  not thinking, but waiting for the slow disk to feed it more data.</p>
<p>Compared to the CPU, the harddrive is a glacier &#8212; it takes hundreds of thousands of clock cycles to get a response.  It&#8217;s like sending a letter across the ocean on a ship.  Compared to the CPU, even RAM, which is orders of magnitude faster than the harddrive, is really slow &#8212; it takes hundreds of cycles to get a response.  So to work around the slowness of RAM, a few more caches are in place:  L1, L2, and sometimes L3.  These are made of very fast, very expensive memory, and they&#8217;re often very small.  L1 and L2 live on the CPU, with L1 being per-core, and L2 usually being shared among all cores on the CPU, and on some newer processors L3 is also on the CPU and also shared (but sometimes it&#8217;s off the CPU).  The L1 cache is further divided into L1-I and L1-D, for Instructions and Data, but that&#8217;s a topic for a different day.  Generally L1 will be the smallest and the fastest, L2 will be slower but larger, and L3 will be slower but larger still.</p>
<p>RAM is divided into bytes.  In the old days, you&#8217;d even read and write it as individual bytes.  (some fancy-pants microcontrollers even offered bit addressing, but they typically worked on bytes and just hid that detail from you).  These days, its read and written in larger chunks (there are some exceptions, but generally reads and writes work on much much larger units).  These chunks are typically Cache Lines.  A cache line on a current X86 processor is 64 bytes.  So when your program reads byte at location 0, behind your back the cache will also fetch bytes 1-63 for you.  This may seem totally bizarre and wasteful, but the reason for it is because often programs that ask for a value at a certain location will ask for nearby locations shortly after (phenomena referred to as &#8220;spatial locality&#8221; and &#8220;locality of reference&#8221;).  The cache will hold on to that entire line until it seems like the program&#8217;s not interested in it any longer, at which point it evicts it, and asks for a new line.  You probably undergo a similar optimization when you&#8217;re putting on your socks and shoes:  you grab them in pairs, and put them on one at a time.  You could put one on, and then go and grab the second and put it on, but it&#8217;s easier to just get them all at once.</p>
<p>When a CPU core holds a cache line, it&#8217;s free to modify it as it needs to &#8212; this is known as the M state (for Modified).  When a line is &#8220;M&#8221;, it no longer matches what&#8217;s in other caches and RAM &#8212; this is ok, as long as it makes it back there eventually.  If a line in cache holds no valid data, it&#8217;s Invalid, or in the I state.  If a CPU holds a line unmodified, and no one else has it, it&#8217;s &#8220;Exclusive&#8221;, or E state.  And finally, when a line is held by 2 or more cores, it&#8217;s in the Shared, or S, state &#8212; this is where it begins to get complicated.</p>
<p>Let&#8217;s say that 2 cores are working on the same cache line.  Core 1 changes one of the bytes.  This means that Core 2&#8242;s version is Invalid (I state!).  So when Core 2 needs to read a byte from that line, it notes that it needs to get the latest version &#8212; this involves asking Core 1 for its latest copy.  Core 2 then changes a byte, invalidating Core 1&#8242;s line, so Core 1 then asks for Core 2&#8242;s copy, and the cycle repeats.  This, as you can imagine, gets really slow &#8212; the cache line is bouncing around between cores, continually invalidating each other, causing them to stall.</p>
<p>For memory that doesn&#8217;t get written to (the instructions that make up the program, for example), this isn&#8217;t a problem &#8212; the lines are Shared (S state), and never modified, so both cores can happily operate simultaneously, each using their own identical copy with no stalls or invalidations.</p>
<p>So, how did we get the situation above?  Here&#8217;s the source code:</p>
<pre>#import &lt;Foundation/Foundation.h&gt;
#import &lt;libkern/OSAtomic.h&gt;

static volatile uint32_t value1 = 1;
static volatile uint32_t padding1[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
static volatile uint32_t value2 = 1;
static volatile uint32_t paddings[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
static volatile uint32_t count = 2;

#define ROUNDS (0x8000000)

void *counterThread(void *arg)
{
 unsigned int i;
 for(i=0;i&lt;ROUNDS;++i)
    OSAtomicIncrement32(arg);
 OSAtomicDecrement32(&amp;count);
 return 0;
}

int main()
{
 pthread_t pth;
 
 printf("&amp;value1: %p\n", &amp;value1);
 printf("&amp;value2: %p\n", &amp;value2);
 if((ptrdiff_t)&amp;value2 - (ptrdiff_t)&amp;value1 &lt; 64)
    printf("same cache line!\n");
 else
    printf("different cache line!\n");
 
 double now, then = CFAbsoluteTimeGetCurrent();
 
 pthread_create(&amp;pth, NULL, counterThread, &amp;value1);
 pthread_create(&amp;pth, NULL, counterThread, &amp;value2);
 
 while(count)
    usleep(1);
 now = CFAbsoluteTimeGetCurrent() - then;
 printf("elapsed: %fms\n", now*1000);
 
 return 0;
</pre>
<pre>}
</pre>
<p>What we&#8217;re doing is spawning 2 threads, and having each of them count at a certain location. Note that our timing is not particularly accurate &#8212; it&#8217;s intended to give us the general magnitude of the duration, not the fine-grained timing information we have in other profiling posts.</p>
<p>When optimization is disabled, the compiler doesn&#8217;t do any cleanup, and just uses the variables as they&#8217;re described at the top (value1, value2, padding1, padding2).  Because of the Padding variables (64bytes of them, to be exact), we guaranteed that our value1 and value2 variables are at least 64bytes apart (thus, on different cache lines) in memory &#8212; this allows each CPU core to operate on a different cache line, never interrupting the other cores.</p>
<p>When we enable optimization, the compiler reasons about the code some.  It notices that we never actually use padding1 and padding2, and so it removes them (normally, fewer variables will improve performance, as the remaining live variables will be together in cache).  In this case, it puts these two variables adjacent (on the same cache line), unaware that they&#8217;re about to be hammered by 2 different Cores with 2 different Caches.  So even though the threads are operating on distinctly separate regions of memory, they&#8217;re the same logical cache line, and thus performance issues emerge.</p>
<p>One simple way to fix this is to tell the compiler to align each of the variables to its own cache line using the &#8220;aligned&#8221; attribute.  This will ensure that we have plenty of space around our values.</p>
<p>Variables aren&#8217;t typically the source of cache line contention like this;  the more frequent culprits are spinlocks and mutexs &#8212; thankfully, on OS X mutexes are big enough to be cache-line aligned already, but spinlocks are not.  So if you have a large block of spinlock definitions at the top of some source file, it might be useful to separate them some if contention is actually rears its head.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=213</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NX in action</title>
		<link>http://perpendiculo.us/?p=158</link>
		<comments>http://perpendiculo.us/?p=158#comments</comments>
		<pubDate>Mon, 28 Jun 2010 05:49:46 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=158</guid>
		<description><![CDATA[NX, or the No eXecute bit, is an interesting technology that prevents instructions on the stack from getting executed.  The reason for this is security (stack smashing becomes a bit more difficult for a would-be attacker), and the implications are typically few and far between. The way it works is by marking stack memory pages [...]]]></description>
				<content:encoded><![CDATA[<p>NX, or the No eXecute bit, is an interesting technology that prevents instructions on the stack from getting executed.  The reason for this is security (stack smashing becomes a bit more difficult for a would-be attacker), and the implications are typically few and far between.<span id="more-158"></span>  The way it works is by marking stack memory pages as non-executable (but still read/writable; historically, x86 only had read/write, and read implied execute).  Pretty simple stuff.</p>
<p>One of the more interesting aspects (on OS X at least) is that in 32bit mode, the heap is automatically executable (no need for <code>mmap()</code> and friends), while in 64bit mode, even the heap is NX&#8217;d.  This makes heap overflows much more difficult to exploit;  you&#8217;re pretty much stuck needing to do a return-to-libc attack (which is still possible, mind you &#8212; NX does nothing to prevent that sort of attack).</p>
<p>Here&#8217;s some example code:</p>
<pre>
#include &lt;stdio.h&gt;
#include &lt;unistd.h&gt;<br /><br />
void (*f)();<br /><br />
int main()
{
        void *ptr = malloc(16);
        memset(ptr, 0xc3, 16);    // 0xc3 = RET
        f = ptr;<br /><br />
        printf("Executing from heap\n");
        f();
        printf("we're still alive.\n");<br /><br />
        char buffer[16];
        memset(buffer, 0xc3, 16);
        f = buffer;
        printf("next up, from the stack\n");
        f();
        printf("we're still alive.\n");<br /><br />
        return 0;
}</pre>
<p>Compile using <code>gcc nx.c -o nx -m32</code>, and you&#8217;ll see the program crash on the &#8220;next up, from the stack&#8221; step.  Swap in <code>-m64</code> instead of <code>-m32</code>, and you&#8217;ll see it crash immediately after &#8220;Executing from the heap.&#8221;</p>
<p>None of this is particularly new or earth-shattering, but it&#8217;s a neat little concept to play with.  For self-modifying code paths, or <acronym title="Just In Time">JIT</acronym> compilers, this can be require a slight detour (though JITs should be using <code>mmap()</code> by now anyway).</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=158</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>past trends</title>
		<link>http://perpendiculo.us/?p=175</link>
		<comments>http://perpendiculo.us/?p=175#comments</comments>
		<pubDate>Tue, 11 May 2010 04:09:53 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=175</guid>
		<description><![CDATA[Visualizing the badness of purchasing a new car.The graph of our networth over time from wedding day to current.  Note the general positive trend, up until march 2010.  That would be the car purchase&#8230;  (even relocating across the country, losing one job, and purchasing a bunch of new stuff didn&#8217;t swing February down&#8230; crazy.)]]></description>
				<content:encoded><![CDATA[<p>Visualizing the badness of purchasing a new car.<span id="more-175"></span>The graph of our networth over time from wedding day to current.  Note the general positive trend, up until march 2010.  That would be the car purchase&#8230;  (even relocating across the country, losing one job, and purchasing a bunch of new stuff didn&#8217;t swing February down&#8230; crazy.)</p>
<div id="attachment_176" class="wp-caption alignright" style="width: 635px"><a href="http://perpendiculo.us/wp-content/uploads/2010/05/cars.png"><img class="size-full wp-image-176" title="cars" src="http://perpendiculo.us/wp-content/uploads/2010/05/cars.png" alt="" width="625" height="343" /></a><p class="wp-caption-text">Purchasing a new car starts an aggressive downward trend.. let&#39;s hope we can hold out</p></div>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=175</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Week -or- The Craziest Story Ever Told</title>
		<link>http://perpendiculo.us/?p=165</link>
		<comments>http://perpendiculo.us/?p=165#comments</comments>
		<pubDate>Mon, 08 Feb 2010 05:12:28 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=165</guid>
		<description><![CDATA[I&#8217;ve been chilling in my new Cupertino apartment for about 3 days now.  Jet Lag still makes me wake up between 5 and 6am local time, but strangely allows me to stay up till midnight.  When I need to go somewhere, I walk, and until Alisa gets here next week, I likely won&#8217;t have much [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve been chilling in my new Cupertino apartment for about 3 days now.  Jet Lag still makes me wake up between 5 and 6am local time, but strangely allows me to stay up till midnight.  When I need to go somewhere, I walk, and until Alisa gets here next week, I likely won&#8217;t have much in the way of amenities.  Not that I&#8217;m in dire need, mind you &#8212; it&#8217;s just not a priority at the moment.<span id="more-165"></span>So anyway, what happened?  Dialing the Way Back machine back to October 2009, you may recall a certain <a title="Reality Distortion Field Deflector" href="http://perpendiculo.us/?p=155" target="_blank">Reality Distortion Field Deflection</a>.  At the time, it looked like I was still Columbus bound for the time being.  However, some phone calls started coming in again in December, and by just before Christmas the process had restarted, albeit with a few changes.  In January some paperwork came my way, I wrote my name and social security number a bunch of times and sent them back, and then I waited.</p>
<p>My start date was set at February First.  This was established on January 29th, which didn&#8217;t leave a lot of time.  Being full of reckless bravado, I welcomed the challenge.</p>
<p>Packing was a mad dash &#8212; stuffing in lots of my clothes, but also some of Alisa&#8217;s less-frequently-used stuff.  The bulk of packing was unfortunately left to Alisa after my departure.</p>
<p>The flight had a layover in Las Vegas.  As always, the people flying to Las Vegas are looking for a good time, and this time was no exception.  Except that one of the passengers behind me had a little bit too much to drink, and decided that he was &#8220;done with the plane&#8221; half an hour before we landed.  He started unloading the overhead bins, and being obnoxious in general (all the while his boss was trying to get him to calm down).  Being seated in the next-to-last row on the plane, it took no less than 79 hours to get off the plane because people still haven&#8217;t mastered the art of fetching stuff from overhead _Before_ they need to move (so they don&#8217;t hold up everyone else) or simply having light (or no!) carry-on at all.  Due to the delay, our inebriated friend started talking about blowing up the plane, figuring that it&#8217;d encourage people to get off faster (and possibly for mild comic effect).  Instead all it did was make some passengers look horrified (in the &#8220;I pity what&#8217;s going to happen to you&#8221; sense, not the &#8220;I&#8217;m scared for my life!&#8221; sense) and make the flight attendants scramble to deal with him.</p>
<p>Estimating a week-long apartment search, I booked a hotel through Thursday morning, and a car through Thursday evening &#8212; I figured the shorter duration would force me to take action, or at least force me to sleep on a park bench for a few nights if I failed.  The 55-60 degree weather of San Jose was a welcomed change from the 8 degree winter that Ohio was offering at the time, but even so park benches weren&#8217;t overly inviting.</p>
<p>Alisa pulled through and arranged some apartments using elite time-zone tricks like calling after-hours in Ohio (which is still business hours in Cali).  Additionally, a co-worker drove me through the nearby area on the way to dinner one evening, so I got to check out some more (from the safety of Troy&#8217;s Audi).  When a suitable place was determined (a 25 minute walk to/from the office, a 20 minute walk to/from Target and the bank, a 15 minute walk to/from church, a 10 minute walk to Whole Foods, and most important, a 5 minute walk to/from Panda Express), I pulled the necessary $2175 for rent, deposit, and application fees out of savings, and landed some living quarters Thursday morning (actually, there were 2 transactions, one of which was deposit + application fees;  that happened on Wednesday.  Rent was due when I actually got approved and signed the lease), and moved in my two suitcases.   (A fabulous reason to have an emergency fund &#8212; also, expenses piled up really quickly with plane tickets and car rentals and hotel stays and buying food, so having credit cards handy makes a huge difference despite what <a title="Dave Ramsey" href="http://www.daveramsey.com/article/the-truth-about-credit-card-debt/" target="_blank">Dave Ramsey unwisely counsels</a>.)</p>
<p>The next tricky bit was getting from the Airport (amusingly, San Jose is abbreviated &#8220;SJ&#8221; all over the freeway signs.  SJ is also an abbreviation for <a title="Steve Jobs as SJ" href="http://americanhistory.si.edu/collections/comphist/sj1.html#tools" target="_blank">Steve Jobs</a>, my new boss (no, I haven&#8217;t met him)) back to Cupertino.  Returning the car went ok (as long as you throw way too much money at them, and don&#8217;t kill the car, they don&#8217;t care all that much), but finding a Taxi for less than $60 to cross town was difficult.  To further complicate matters, it was monsoon season, so every minute I was outside looking for transit I was getting wetter and wetter.  Eventually someone overheard a quote, and offered a ride instead (citing $60 to cross town as high way robbery) &#8212; He was driving to Mountain View, which is just north of Cupertino, so it wasn&#8217;t too far out of his way.  We chatted some, and he&#8217;s in marketing, but was formerly a software engineer as well, so we had some interesting discussions along the way.  I offered him the cash I had upon safe arrival ($40), but he declined, saying &#8220;Welcome to California&#8221; &#8212; despite my usual low impression of people generally, California (or at least the Bay area) offers some extremely hospitable residents &#8212; a few years back at WWDC smokris and I got some pleasant breaks while ironing out transit (except when it came to the Taxi guy at the end).</p>
<p>Saturday was spent wandering around getting acclimated some more.  I picked up some basics (shower curtain, razors, trash can, food+juice for sunday, ordered internet (only Comcast is available where our apartment is, that sucks)), and also discovered that my key card doesn&#8217;t grant me access to my office building on weekends.  That&#8217;s a shame (as I&#8217;m accustomed to dumping 60+ hours a week into what I do because it&#8217;s a work of passion for me), but maybe I&#8217;ll work out some agreement and get access or something later.</p>
<p>Sunday was a nice relaxing day.  I got to catch a nap, and meet some new people.</p>
<p>Work-wise, it&#8217;s been stellar.  I&#8217;m surprised how effective an office is for me (I share an office with Troy at the moment, so he&#8217;s probably much less effective with my constant barrage of questions/thoughts), and the people are great.  The paranoia and secrecy you hear about is entirely true &#8212; the only other organizations that maintain this many secrets are probably the US Government and the KGB.  At first this was a bit saddening (coming from Kosada, where I knew basically everything that was going on), but after a day or two I realized how much more focused I could be.  There are rough edges of course (like trying to work with something &#8220;secret&#8221;), but those are exceptional cases.  This tends to make each team an expert on their particular project, which in turn allows for wildly fast turnaround times and equally impressive wide-scale refactoring/redesign that goes almost completely unnoticed on the outside.</p>
<p>The coworkers are great.  I&#8217;ve learned a lot already, and the group seems really close.  There&#8217;s a NeXTie who dates back to before OpenGL from what I can tell, and several other very talented, very sharp people.  And they all do things besides pressing buttons all day!  That&#8217;s amazing to me (computer nerds get a bad rap for living in their mom&#8217;s basement until they&#8217;re 35) for some reason (not that previously anyone was that bad, it&#8217;s just a stereotype that I expect for some reason).  After spending so many years reverse engineering the project I&#8217;m now working on, I felt pretty at home almost instantly, and immediately dug in and starting learning how to deal with bugs, features, and the development environment in general in an environment where the schedule is much more rigid than my previous experience.  There are all kinds of cool little details I&#8217;d love to chat about, but I&#8217;ve probably said more than I should have already.</p>
<p>And the lunches are fantastic!  It&#8217;s nice to be able to eat such a wide variety of food, and socialize with fellow engineers.  It&#8217;s almost impossible to walk through the courtyard area of Infinite Loop and not have a hugely goofy grin on my face <img src='http://perpendiculo.us/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=165</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reality Distortion Field Deflector (RDFD)</title>
		<link>http://perpendiculo.us/?p=155</link>
		<comments>http://perpendiculo.us/?p=155#comments</comments>
		<pubDate>Wed, 14 Oct 2009 22:17:20 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=155</guid>
		<description><![CDATA[Mid last month (September, for those keeping score at home), a peculiar email arrived in my inbox.  Therein, I was referred by an Apple employee with an opportunity to potentially work there.  To say the least, my interest was piqued.  After all, after spending the past 2 years up to my elbows in some of [...]]]></description>
				<content:encoded><![CDATA[<p>Mid last month (September, for those keeping score at home), a peculiar email arrived in my inbox.  Therein, I was referred by an Apple employee with an opportunity to potentially work there.  To say the least, my interest was piqued.  After all, after spending the past 2 years up to my elbows in some of their software&#8217;s guts, reverse engineering, patching, and exploring, I&#8217;d like to think I had some authority on the subject.</p>
<p><span id="more-155"></span></p>
<p>Being an employee referral, some cool perks came my way.  Specifically, I got to skip the phone interview entirely, and got flown out there, put up in a nice hotel (Cypress Hotel), and had a fun dark blue Mazda 5 waiting for me.  Very smooth, to say the least.  I&#8217;m pretty sure that everyone who gets interviewed receives such treatment.</p>
<p>From first contact to flight out there, it was 2 weeks.  Pretty darn fast.</p>
<p>After resting up that evening in San Jose, I awoke the next morning for the interview.  I was told that ongoings in said interview are confidential.  In general, things went pretty smoothly.  Getting to meet the QC team was quite exciting, and meeting several talented engineers is also always great.  Finally, the recruiter came in, we threw around some numbers, and then I left for my flight home.  My total time in interviews was around 4-4.5 hours, and my total time in San Jose was about 16 hours.  Very brief.</p>
<p>Within a week, the background check company had contacted me to verify my employment.  Paystubs, Articles of Incorporation, and all that went in.  Then, a turn for the worst:  a week of absolute silence.  2 weeks without hearing anything is never a good sign.</p>
<p>Today, the rejection letter finally came.  Total time from first contact to rejection:  28 days.  Citing budgetary issues etc.  Whether or not that&#8217;s valid is of no concern to me.  At least it frees me up to continue working on things here.  Though I am a bit worried about the future of the technology I&#8217;ve so lovingly dealt with&#8230;</p>
<p>So, there you have it:  a brief trip into the heart of the Reality Distortion Field itself, and then a successful deflection.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=155</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>@synchronized, NSLock, pthread, OSSpinLock showdown, done right</title>
		<link>http://perpendiculo.us/?p=133</link>
		<comments>http://perpendiculo.us/?p=133#comments</comments>
		<pubDate>Wed, 23 Sep 2009 05:19:50 +0000</pubDate>
		<dc:creator>cwright</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perpendiculo.us/?p=133</guid>
		<description><![CDATA[Somewhere out there on the internet, there&#8217;s a &#8220;showdown&#8221; between @synchronized, NSLock, pthread mutexes, and OSSpinLock. It aims to measure their performance relative to each other, but uses sloppy code to perform the measuring. As a result, while the performance ordering is correct (@synchronized is the slowest, OSSpinLock is the fastest), the relative cost is [...]]]></description>
				<content:encoded><![CDATA[<p>Somewhere out there on the internet, there&#8217;s a &#8220;showdown&#8221; between @synchronized, NSLock, pthread mutexes, and OSSpinLock. It aims to measure their performance relative to each other, but uses sloppy code to perform the measuring. As a result, while the performance ordering is correct (@synchronized is the slowest, OSSpinLock is the fastest), the relative cost is severely misrepresented. Herein I attempt to rectify that benchmark.</p>
<p><span id="more-133"></span>Locking is absolutely required for critical sections. These arise in multithreaded code, and sometimes their performance can have severe consequences in applications. The problem with the aforementioned benchmark is that it did a bunch of extraneous work while it was locking/unlocking. It was doing the same amount of extraneous work, so the relative order was correct (the fastest was still the fastest, the slowest still the slowest, etc), but it didn&#8217;t properly show just how much faster the fastest was.</p>
<p>In the benchmark, the author used autorelease pools, allocated objects, and then released them all.  While locking.  This is a pretty reasonable use-case, but by no means the only one.  For most high-performance, multithreaded code, you&#8217;ll spend a _bunch_ of time trying to make the critical sections as small and fast as possible.  Large, slow critical sections effectively undo the multithreading speed up by causing threads to block each other out unnecessarily.  So when you&#8217;ve trimmed the critical sections down to the minimum, another sometimes-justified optimization is to optimize the amount of time spent locking/unlocking itself.</p>
<p>Just to make things exciting though, not all locking primitives are created equal.  Two of the 4 mentioned have special properties that can affect how long they take, and how the operate under pressure.  I&#8217;ll get to that towards the end.</p>
<p>First up, here&#8217;s my &#8220;no-nonsense&#8221; microbench code:</p>
<pre>#import &lt;Foundation/Foundation.h&gt;
#import &lt;objc/runtime.h&gt;
#import &lt;objc/message.h&gt;
#import &lt;libkern/OSAtomic.h&gt;
#import &lt;pthread.h&gt;

#define ITERATIONS (1024*1024*32)

static unsigned long long disp=0, land=0;

int main()
{
 double then, now;
 unsigned int i, count;
 pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 OSSpinLock spinlock = OS_SPINLOCK_INIT;

 NSAutoreleasePool *pool = [NSAutoreleasePool new];

 NSLock *lock = [NSLock new];
 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i&lt;ITERATIONS;++i)
 {
 [lock lock];
 [lock unlock];
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("NSLock: %f sec\n", now-then);    

 then = CFAbsoluteTimeGetCurrent();
 IMP lockLock = [lock methodForSelector:@selector(lock)];
 IMP unlockLock = [lock methodForSelector:@selector(unlock)];
 for(i=0;i&lt;ITERATIONS;++i)
 {
 lockLock(lock,@selector(lock));
 unlockLock(lock,@selector(unlock));
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("NSLock+IMP Cache: %f sec\n", now-then);    

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i&lt;ITERATIONS;++i)
 {
 pthread_mutex_lock(&amp;mutex);
 pthread_mutex_unlock(&amp;mutex);
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("pthread_mutex: %f sec\n", now-then);

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i&lt;ITERATIONS;++i)
 {
 OSSpinLockLock(&amp;spinlock);
 OSSpinLockUnlock(&amp;spinlock);
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("OSSpinlock: %f sec\n", now-then);

 id obj = [NSObject new];

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i&lt;ITERATIONS;++i)
 {
 @synchronized(obj)
 {
 }
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("@synchronized: %f sec\n", now-then);

 [pool release];
 return 0;
}</pre>
<p>We do 5 tests:  We test NSLock, NSLock with IMP caching, pthread mutexes, OSSpinLocks, and then finally @synchronized.  We simply lock and unlock 33554432 times (that&#8217;s 1024*1024*32 for those keeping score at home <img src='http://perpendiculo.us/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> , and see how long it takes.  No allocation, no releases, no autorelease pools, nothing.  Just pure lock/unlock goodness.  I ran the test a few times, and averaged the results (so overall, the results are from something like 100 million lock/unlock cycles each)</p>
<ol>
<li>NSLock: 3.5175 sec</li>
<li>NSLock+IMP Cache: 3.1165 sec</li>
<li>Mutex: 1.5870 sec</li>
<li>SpinLock: 1.0893</li>
<li>@synchronized: 9.9488 sec</li>
</ol>
<div class="wp-caption alignnone" style="width: 629px"><img title="Lock Performance" src="http://perpendiculo.us/wp-content/uploads/2009/09/LockPerformance.png" alt="Lock Performance" width="619" height="265" /><p class="wp-caption-text">Lock Performance</p></div>
<p>From the above graph, we can see a couple thing:  First, @synchronized is _Really_ expensive &#8212; like, 3 times as expensive as anything else.  We&#8217;ll get into why that is in a moment.  Otherwise, we see that NSLock and NSLock+IMP Cache are pretty close &#8212; these are built on top of pthread mutexes, but we have to pay for the extra ObjC overhead.  Then there&#8217;s Mutex (pthread mutexes) and SpinLock &#8212; these are pretty close, but even then SpinLock is almost 30% faster than Mutex.  We&#8217;ll get into that one too.  So from top to bottom we have almost an order of magnitude difference between the worst and best.</p>
<p>The nice part about these all is that they all take about the same amount of code &#8212; using NSLock takes as many lines as a pthread mutex, and the same number for a spinlock.  @synchronized saves a line or two, but with a cost like that it quickly looks unappealing in all but the most trivial of cases.</p>
<p>So, what makes @sychronized and SpinLock so different from the others?</p>
<p>@synchronized is very heavy weight because it has to set up an exception handler, and it actually ends up taking a few internal locks on its way there.  So instead of a simple cheap lock, you&#8217;re paying for a couple locks/unlocks just to acquire your measly lock.  Those take time.</p>
<p>OSSpinLock, on the other hand, doesn&#8217;t even enter the kernel &#8212; it just keeps reloading the lock, hoping that it&#8217;s unlocked.  This is terribly inefficient if locks are held for more than a few nanoseconds, but it saves a costly system call and a couple context switches.  Pthread mutexes actually use an OSSpinLock first, to keep things running smoothly where there&#8217;s no contention.  When there is, it resorts to heavier, kernel-level locking/tasking stuff.</p>
<p>So, if you&#8217;ve got hotly-contested locks, OSSpinLock probably isn&#8217;t for you (unless your critical sections are _Really_ _Fast_).  Pthread mutexes are a tiny bit more expensive, but they avoid the power-wasting effects of OSSpinLock.</p>
<p>NSLock is a pretty wrapper on pthread mutexes.  They don&#8217;t provide much else, so there&#8217;s not much point in using them over pthread mutexes.</p>
<p>Of course, standard optimization disclaimers apply:  don&#8217;t do it until you&#8217;re sure you&#8217;ve chosen the correct algorithms, have profiled to find hotspots, and have found locking to be one of those hot items.  Otherwise, you&#8217;re wasting your time on something that&#8217;s likely to provide minimal benefits.</p>
]]></content:encoded>
			<wfw:commentRss>http://perpendiculo.us/?feed=rss2&#038;p=133</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
