<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Stata Things &#187; benchmarking</title>
	<atom:link href="http://enoriver.net/index.php/tag/benchmarking/feed/" rel="self" type="application/rss+xml" />
	<link>http://enoriver.net</link>
	<description>computing for fun and profit</description>
	<lastBuildDate>Mon, 07 May 2012 13:43:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Benchmarking</title>
		<link>http://enoriver.net/index.php/2008/11/29/benchmarking/</link>
		<comments>http://enoriver.net/index.php/2008/11/29/benchmarking/#comments</comments>
		<pubDate>Sat, 29 Nov 2008 16:33:24 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[reshape]]></category>
		<category><![CDATA[seq()]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=339</guid>
		<description><![CDATA[In my previous post I cautioned against looping across observations, then showed how to do it anyway, using the example of reshaping a list from long to wide. A reader suggested, not unreasonably, that one might want to use reshape for that. He then proceeded with a code example, under the reservation that he did [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post I cautioned against looping across observations, then showed how to do it anyway, using the example of reshaping a list from long to wide. A reader suggested, not unreasonably, that one might want to use <code>reshape</code> for that. He then proceeded with a code example, under the reservation that he did not know if it would be any faster. This brings me to the topic of benchmarking.</p>
<p>I seldom compare the speed of execution of alternative solutions for the same problem. It's something that's done all the time in general-purpose programming, but in run-of-the-mill statistics and data management this is not a pressing concern. You want to write clear, reproducible code. How long that takes to run is less important than how easy it is to follow and replicate, because it typically doesn't have to run more than once: you write your paper, send it to the publisher, and you're done.</p>
<p>But I don't publish for a living. Instead, I write code that does have to run over and over again, so it's about time that I put some thought in how to measure its performance. If you already have a favorite way of doing that, I am curious. Below is my attempt: a comparison of my initial solution (looping across observations) and Phil's (using <code>reshape</code> and a couple of other clever Stata functions) for a data set of 1,000,000 observations.<br />
<code><br />
clear<br />
set mem 100m<br />
</code><code><br />
set obs 1000000<br />
gen x=uniform()<br />
</code><code><br />
// using the egen function seq()<br />
capture prog drop phil<br />
prog def phil<br />
</code><code><br />
local myvar `1'<br />
count<br />
local n=r(N)<br />
egen i=seq(), from(1) to(`n') block(2)<br />
gen j=mod(_n,2)+1<br />
reshape wide `myvar', i(i) j(j)<br />
destring `myvar'1, replace<br />
</code><code><br />
end<br />
</code><code><br />
// looping across observations<br />
capture prog drop gabi<br />
prog def gabi<br />
</code><code><br />
local myvar `1'<br />
count<br />
local obs=r(N)/2<br />
gen var2=.<br />
</code><code><br />
forvalues i=1/`obs' {<br />
  local there=`i'*2<br />
  local here=`there'-1<br />
  replace var2=`myvar'[`there'] in `here'<br />
}<br />
</code><code><br />
end<br />
</code><code><br />
// speed comparison<br />
foreach k in phil gabi {<br />
  preserve<br />
  di c(current_time) // check the clock<br />
  di "`k''s solution"<br />
  quietly `k' x<br />
  di c(current_time) // check again<br />
  restore<br />
}<br />
</code><br />
The idea is to compare the time posted on screen before and after running each program. On my machine (Dell Latitude D600, Intel Core 2 Duo, 2.0GHz, 2G of RAM) I found this:<br />
<code><br />
11:26:20<br />
phil's solution<br />
11:26:29<br />
11:26:29<br />
gabi's solution<br />
11:26:44<br />
</code><code><br />
.<br />
end of do-file<br />
</code><br />
Clearly, <code>reshape</code> beats looping across observations: 9 seconds vs. 15.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2008/11/29/benchmarking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

