<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Stata Things &#187; Stata</title>
	<atom:link href="http://enoriver.net/index.php/category/stata/feed/" rel="self" type="application/rss+xml" />
	<link>http://enoriver.net</link>
	<description>computing for fun and profit</description>
	<lastBuildDate>Fri, 13 Aug 2010 20:42:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Good to know</title>
		<link>http://enoriver.net/index.php/2010/07/15/good-to-know/</link>
		<comments>http://enoriver.net/index.php/2010/07/15/good-to-know/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 18:46:38 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1311</guid>
		<description><![CDATA[The user-written commands you download to your ado/plus directory are updated once in a while on that RepEc server they come from. So, after you findit and then net install it, your imported command might need to be refreshed occasionally. That is what adoupdate, update does. I was reminded of this when I tried to [...]]]></description>
			<content:encoded><![CDATA[<p>The user-written commands you download to your ado/plus directory are updated once in a while on that RepEc server they come from. So, after you <code>findit</code> and then <code>net install</code> it, your imported command might need to be refreshed occasionally. That is what <code>adoupdate, update</code> does.</p>
<p>I was reminded of this when I tried to run <code>freduse</code> today. Actually, the problem that reminded me of it -- a "not found" error message thrown when the command invoked the Mata function  _fredifinparse() -- didn't go away, but "adoupdate, update" is still a good thing to do. What did fix _fredifinparse() is described <a href="http://vhaguiar.wordpress.com/2010/02/07/stata-tip-importing-stock-info-from-yahoo-finance-and-fed-macroeconomic-data/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/07/15/good-to-know/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The limits of encapsulation</title>
		<link>http://enoriver.net/index.php/2010/07/09/the-limits-of-encapsulation/</link>
		<comments>http://enoriver.net/index.php/2010/07/09/the-limits-of-encapsulation/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 15:10:46 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1304</guid>
		<description><![CDATA[I just read this. I liked it. It put some bit of anguish I've been having into clearer words than I could. My Stata code between 2000 and 2006 consisted exclusively of do-files that put to work either standard Stata commands or user-written commands from the SSC. There was not a single program definition anywhere [...]]]></description>
			<content:encoded><![CDATA[<p>I just read <a href="http://www.johndcook.com/blog/2010/06/30/where-the-unix-philosophy-breaks-down/">this</a>. I liked it. It put some bit of anguish I've been having into clearer words than I could.</p>
<p>My Stata code between 2000 and 2006 consisted exclusively of do-files that put to work either standard Stata commands or user-written commands from the <a href="http://www.stata.com/help.cgi?ssc">SSC</a>. There was not a single program definition anywhere and things worked alright. These do-files were pretty elaborate and their functionality overlapped a fair bit, but that was never that much of an inconvenience.</p>
<p>Then in early 2007, during my brief tenure at RTI Health Solutions, that way of working showed its limitations when I tried to program in plain Stata matrices something that was normally being done in GAUSS. It had to do with the design of factorial experiments and my project ended in an instructive kind of failure, because it got me started on using Stata programs. I still like those things. I can define them once and then nest and have them call each other every which way to my heart's content. They take arguments, return values, and generally they make you feel like a real programmer.</p>
<p>Then in 2008 I had my introduction to C++, and the instructions were clear: break down a problem in small morsels; use as many functions as you need; if a function definition fills up a screen, it's way too big, so break it down further; encapsulation is a good thing. Then came header files, namespaces, classes, templates, the works. It was an extreme kind of validation of the way I had started to do business, and my enthusiasm for modular code only grew from there.</p>
<p>Then, about a year ago, I started running into problems. Component programs can be debugged individually, sure, and you only need to fix them once, in one place, which is great. In fact, if they're small and simple enough, you don't even need to do that; they just work. But with complicated projects you're going to have so many interlocked small and simple programs that it will just be too hard to keep tabs on which programs call which, where, and why. It's also pretty expensive to write them in such a way that they can talk with not just one other program, but are truly universal within the context of the given problem. </p>
<p>So I'm not sure anymore that I would recommend my way of writing Stata code to everybody. It still has its uses, but I can see a growing number of circumstances where it's simply not worth the trouble.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/07/09/the-limits-of-encapsulation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Mata for string processing</title>
		<link>http://enoriver.net/index.php/2010/05/28/using-mata-for-string-processing/</link>
		<comments>http://enoriver.net/index.php/2010/05/28/using-mata-for-string-processing/#comments</comments>
		<pubDate>Sat, 29 May 2010 02:29:46 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1233</guid>
		<description><![CDATA[My friend Dan Blanchette showed me a little Mata function yesterday that he wrote for changing the case -- lower, upper, proper -- for strings longer than 244 characters. It was fresh in my head today as I went looking for something while babysitting my daughter -- can't remember what; babysitting requires undivided attention -- [...]]]></description>
			<content:encoded><![CDATA[<p>My friend <a href="http://faculty.fuqua.duke.edu/home/blanc004/data_programming/">Dan Blanchette</a> showed me a little Mata function yesterday that he wrote for changing the case -- lower, upper, proper -- for strings longer than 244 characters. It was fresh in my head today as I went looking for something while babysitting my daughter -- can't remember what; babysitting requires undivided attention -- and ended up <a href="http://codeandculture.wordpress.com/2010/04/29/grepmerge/">here</a>. </p>
<p>This post is the result of the conversation I started in the comment thread with Gabriel Rossman. I will attempt to use Mata for string processing within a suitably large text file, as opposed to just a blob of text you can call as a local. </p>
<p>Step 1: google "very large text file". This took me to <a href="http://www.textfiles.com/">a magical place</a> where the 1980's are preserved in perpetuity. I went through the categories, and picked <a href="http://www.textfiles.com/programming/dostech.pro">this one</a>. At exactly 12,133 lines, it should do nicely.</p>
<p>Step 2: get the Mata book -- because I still run Stata 10 at home, so no pdf documentation yet. </p>
<p>Step 3: muck around. Eventually I came up with this thing:</p>
<p><code>
<pre>
mata
real scalar checkmatch(string scalar theFile, string scalar thePattern)
{
   real scalar n,i,check
   string matrix A
   A=cat(theFile)
   n=rows(A)
   check=0
   for(i=1; i<=n; i++) {
      if(strmatch(A[i,1],thePattern)) {
         check=1
         return(check)
      }
   }
   return(check)
}
end
</pre>
<p></code><br />
This is a Mata function that returns 1 if a string pattern is found anywhere in a given text file, and 0 otherwise. It makes use of Mata's built-in cat() function, which reads an ASCII file of n lines into a column vector of n string elements, one for each line in the original file. I want checkmatch() to exit with 1 as soon as it first finds the string pattern it's looking for. I'm guessing that the first return(check), inside the if clause, does it, but I'm not sure. </p>
<p>With a text file this big, the 0 case might be the harder one to test, but if you're fishing for patterns you're unlikely to find in an English-language document no matter how big, a Hungarian word is a pretty good bet. So, this is the output:<br />
<code>
<pre>
. mata: checkmatch(`"dostech.pro"',`"*BIOS*"')
  1
. mata: checkmatch(`"dostech.pro"',`"*Kolozsvar*"')
  0
</pre>
<p></code><br />
Now for a real illustration of Mata's string and file processing capabilities, see <a href="http://www.stata-journal.com/article.html?article=pr0049">here</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/05/28/using-mata-for-string-processing/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Filling the gaps in your panel with winsor</title>
		<link>http://enoriver.net/index.php/2010/05/26/filling-the-gaps-in-your-panel-with-winsor/</link>
		<comments>http://enoriver.net/index.php/2010/05/26/filling-the-gaps-in-your-panel-with-winsor/#comments</comments>
		<pubDate>Wed, 26 May 2010 21:18:41 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[extrapolation]]></category>
		<category><![CDATA[gaps]]></category>
		<category><![CDATA[winsor]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1216</guid>
		<description><![CDATA[I recently worked on a project where I had to model groundwater salinity as an indirect function of population growth. The idea is that more people will draw more fresh water from the aquifer; other things equal, saline water will be displace it. I had to do this for sixteen counties in Southern Florida and [...]]]></description>
			<content:encoded><![CDATA[<p>I recently worked on a project where I had to model groundwater salinity as an indirect function of population growth. The idea is that more people will draw more fresh water from the aquifer; other things equal, saline water will be displace it. I had to do this for sixteen counties in Southern Florida and my data -- on population, salinity, and water withdrawals, by year and by county -- had some gaps here and there.</p>
<p>A search on how to best fill them had to start with the Statalist archive. I quickly found <a href="http://www.stata.com/statalist/archive/2008-08/msg00856.html">this simple solution</a> by Scott Merryman, which solved the core of the problem. But then things got complicated. First, I had several variables of interest and I had no reason to expect that they all would have a linear time trend. Second, some of my variables -- such as measured groundwater salinity -- showed pretty unbelievable outliers.</p>
<p>So I needed some kind of wrapper that would accommodate different variables, trends of different orders, and dummy variables to account for any differences between counties or groups of counties. And I needed a quick and easy way to deal with the outliers. The wrapper is below:</p>
<p><code>
<pre>
// #### higher-order predictions of anything against time.
// takes three arguments:
// 1. order, numeric= 1,2,3,4,5,etc. for linear, squared,
//    cubed, etc. trend
// 2. lhs, string, the name of the left-hand side variable:
//    salinity, usage, population, etc.
// 3. dummy, string -- name of the group identifier on the
//    right-hand side, e.g., county fips code.
capture prog drop getValueHat
program getValueHat

args order lhs dummy

levelsof `dummy', local(countem)
local checkit: list sizeof countem
// if `dummy' is not degenerated to one value, then use xi: regress
if `checkit'>1 {
   local regressors i.`dummy'*year1
   forvalues i=1/`order' {
      gen year`i'=year^`i'
      if `i'>1 {
         local regressors `regressors' i.`dummy'*year`i'
      }
   }
   xi: regress `lhs' `regressors'
}
// otherwise, just regress
else {
   local regressors year1
   forvalues i=1/`order' {
      gen year`i'=year^`i'
      if `i'>1 {
         local regressors `regressors' year`i'
      }
   }
   regress `lhs' `regressors'
}
predict `lhs'_hat
replace `lhs'_hat=max(0,`lhs'_hat)
if `checkit'>1 {
   drop _I*
}
forvalues i=1/`order' {
   drop year`i'
}

end
</pre>
<p></code></p>
<p>Now I can use the same code for different variables with different functional forms, which is nice. For example, if I want to fill gaps in salinity at the county level with a quadratic trend, this program call will get me the right salinity_hat:<br />
<code>
<pre>
getValueHat 2 salinity county
</pre>
<p></code></p>
<p>Dealing with the outliers was a far simpler matter: Nick Cox's <code>winsor</code> command -- <code>findit winsor</code> if you don't have it installed. The complete solution to my problem of gaps and outliers is:</p>
<p><code>
<pre>
winsor salinity, h(3) gen(x)
getValueHat 2 x county
rename x_hat salinity_hat
drop x
</pre>
<p></code></p>
<p>If, like me, you've never Winsorized before, <a href="http://en.wikipedia.org/wiki/Winsorising">here's the Wikipedia entry</a> on the procedure.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/05/26/filling-the-gaps-in-your-panel-with-winsor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automated sanity checks</title>
		<link>http://enoriver.net/index.php/2010/04/04/automated-sanity-checks/</link>
		<comments>http://enoriver.net/index.php/2010/04/04/automated-sanity-checks/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 06:30:27 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1172</guid>
		<description><![CDATA[I am reading An Introduction to Stata Programming, by Christopher Baum. He suggests, in Chapter 5.2, a nice do-file method to validate your data: you use pairs of list and assert. For example, suppose you know that a variable v should have no missing values. If it indeed does not, then assert !missing(v) should run [...]]]></description>
			<content:encoded><![CDATA[<p>I am reading <a href="http://stata-press.com/books/isp.html">An Introduction to Stata Programming</a>, by Christopher Baum.</p>
<p>He suggests, in Chapter 5.2, a nice do-file method to validate your data: you use pairs of <code>list</code> and <code>assert</code>. For example, suppose you know that a variable v should have no missing values. If it indeed does not, then <code>assert !missing(v)</code> should run without error. If it does, you want to know where they are: <code>list if missing(v)</code>. Reversing the order of these two lines in a do-file will cause Stata to exit with an error, which alerts you that there are problems with your data, but not before it shows you where the problems are:<br />
<code>
<pre>sysuse auto, clear
list if missing(make)
assert !missing(make)
list if missing(rep78)
assert !missing(rep78)</pre>
<p></code><br />
This is neat because you can always edit this do-file with a new pair of list/assert lines as needed. But, as the author mentions, sometimes a <code>summarize</code> is plenty helpful too, especially if you remember that it accepts a list of variables as an argument. You could do this for example:<br />
<code>
<pre>sysuse auto, clear
sum make rep78</pre>
<p></code><br />
Wait. That didn't work too well, because <code>summarize</code> tells you nothing about make: it treats string variables as missing. You would know that that was indeed the case if before <code>sum</code> you would have requested <code>describe</code>.</p>
<p>So, I guess, before you do any kind of data validation, <code>describe</code> is a good first step; you might also like <code>codebook</code>; I don't. I find it too wordy. But it does do the job of giving you information about the whole data set.</p>
<p>One alternative to wordy output when you have a specific question regarding more than one variable is to use little custom programs for data checks. On such example is countIfMissing, shown in my previous post. Another, inspired by Christopher Baum's use of <code>summarize</code> might be sumIfNumeric:<br />
<code>
<pre>capture prog drop sumIfNumeric
program sumIfNumeric

unab fullset: _all
local numset
foreach varble in `fullset' {
   capture confirm numeric variable `varble'
   if _rc==0 {
      local numset `numset' `varble'
   }
}
local check: list sizeof numset
if `check'&gt;0 {
   sum `numset'
}
else {
   di "No numeric variables found in this dataset."
}

end</pre>
<p></code><br />
This might be useful when your data set comes with a bunch of variables -- some numeric, some string. Though <code>describe</code> will tell them apart easily enough, you may not care to list them explicitly. The usage is straightforward:<br />
<code>
<pre>sysuse auto, clear
sumIfNumeric</pre>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/04/04/automated-sanity-checks/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Count missing observations</title>
		<link>http://enoriver.net/index.php/2010/03/16/count-missing-observations/</link>
		<comments>http://enoriver.net/index.php/2010/03/16/count-missing-observations/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 02:54:59 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[missing()]]></category>
		<category><![CDATA[syntax]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1144</guid>
		<description><![CDATA[With one variable, that's easy enough: count if missing(variable-name). If you have several variables, you can put them in a foreach loop. But if you have to do this for arbitrary lists of variables in several files, it may be interesting to package that foreach loop inside a quick command that might handle special display [...]]]></description>
			<content:encoded><![CDATA[<p>With one variable, that's easy enough: <code>count if missing(<em>variable-name</em>)</code>. If you have several variables, you can put them in a foreach loop. But if you have to do this for arbitrary lists of variables in several files, it may be interesting to package that foreach loop inside a quick command that might handle special display instructions as well. </p>
<p>Here is one suggestion:</p>
<p><code>
<pre>
// countIfMissing: display the total count of observations, then
// any counts of missing observations for each variable in a list.
capture prog drop countIfMissing
program countIfMissing

version 11
syntax varlist

quietly count
local count=r(N)

// now make things align nicely
local sum=`count'
local tens=1
while `sum'/10>1 {
   local sum=`sum'/10
   local tens=`tens'+1
}
local width=`tens'+int(`tens'/3)
local varct: list sizeof varlist

di ""
di "Observations:"
di %`width'.0fc `count'
di ""
di "Missing:"
foreach varble in `varlist' {
   qui count if missing(`varble')
   local ct=r(N)
   local pct: di %4.2fc 100*`ct'/`count'
   if `pct'>0 {
      di %`width'.0fc `ct' " `varble' (`pct'%)"
   }
   else {
      local varct=`varct'-1
   }
}
if `varct'==0 {
   local offset=`width'+2
   di _column(`offset') "none of `varlist'"
}
di ""

end
</pre>
<p></code></p>
<p>For an example of usage, you can try this:</p>
<p><code>
<pre>
sysuse auto
local myvars "make price foreign"
countIfMissing `myvars'
countIfMissing m*       // (1)
countIfMissing _all     // (2)
</pre>
<p></code></p>
<p>As you can see in (1) and (2), the usual varlist conveniences apply here. </p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/03/16/count-missing-observations/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Making Vim run Stata and clean up after itself</title>
		<link>http://enoriver.net/index.php/2010/03/02/making-vim-run-stata-and-clean-up-after-itself/</link>
		<comments>http://enoriver.net/index.php/2010/03/02/making-vim-run-stata-and-clean-up-after-itself/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 06:04:11 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[text editors]]></category>
		<category><![CDATA[Vim]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1071</guid>
		<description><![CDATA[Last week I mentioned that in the course of switching from Notepad++ to Vim I lost the ability to run Stata do-files or selected lines from within the text editor, and I asked my readers for help if they had a solution. What do you know, one of them did, and wrote to me all [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I mentioned that in the course of switching from Notepad++ to Vim I lost the ability to run Stata do-files or selected lines from within the text editor, and I asked my readers for help if they had a solution. What do you know, one of them did, and wrote to me all the way from China with a Vim script that solves both problems. I won't repeat it here because it is almost identical to <a href="http://www.stata.com/statalist/archive/2006-06/msg00905.html">this one</a>. Clearly, I need to remember to always look into the Statalist archive first. </p>
<p>There was one line that I had to change, where Vim is ordered to clean up after itself. Every time you runDoLines() you produce a bunch of .tmp.do files in your %temp% directory, which Vim knows as $TEMP. Though this<br />
<code>
<pre>
au VimLeave * exe "!del -y " temp
</pre>
<p></code><br />
should have deleted them, on my computer it did not. This one did:<br />
<code>
<pre>
au VimLeave * silent exe '!del /Q "'.$TEMP.'\*.tmp.do"'
</pre>
<p></code><br />
Another thing I tinkered with was the way Vim handles these backup files that end in a tilde. You can disable them entirely, by adding<br />
<code>
<pre>
set nobackup nowritebackup
</pre>
<p></code><br />
to _vimrc as explained <a href="http://www.faqs.org/faqs/editor-faq/vim/">here</a>. You can also have Vim keep them somewhere you can delete them explicitly, in bulk, as explained <a href="http://vimdoc.sourceforge.net/cgi-bin/vimfaq2html3.pl#7.2">here</a>. Finally, you can use a combination of both: you save backup files while Vim is working, but then you make it delete them all when it starts next time. That is explained <a href="http://vim.wikia.com/wiki/Remove_swap_and_backup_files_from_your_working_directory">here</a>. I went with it, so after I created a temp folder in $VIMRUNTIME, I added this to my _vimrc file:<br />
<code>
<pre>
" Keeps backups in $VIMRUNTIME\temp folder, cleans them up
silent execute '!mkdir "'.$VIMRUNTIME.'\temp"'
silent execute '!del /Q "'.$VIMRUNTIME.'\temp\*~"'
set backupdir=$VIMRUNTIME\\temp\\
set directory=$VIMRUNTIME\\temp\\
</pre>
<p></code><br />
In the process of figuring this out, I also discovered that there's a quick way to clean up %temp% thoroughly, as shown <a href="http://www.css-networks.com/2008/08/how-to-clear-the-temp-directory.html">here</a>. Good to know. One of these days I might add that to _vimrc too. Now I'm on a roll.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/03/02/making-vim-run-stata-and-clean-up-after-itself/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Program vs. include smackdown</title>
		<link>http://enoriver.net/index.php/2010/02/28/program-vs-include-smackdown/</link>
		<comments>http://enoriver.net/index.php/2010/02/28/program-vs-include-smackdown/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 02:25:11 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1054</guid>
		<description><![CDATA[When it comes to defining local macros in a different place from where you use them, you have two options: a do-file you include as needed or an r-class program that you call as needed. I talked about it here and said that a program is a better choice, without any evidence to back up [...]]]></description>
			<content:encoded><![CDATA[<p>When it comes to defining local macros in a different place from where you use them, you have two options: a do-file you <code>include</code> as needed or an r-class program that you call as needed. I talked about it <a href="http://enoriver.net/index.php/2010/02/18/define-local-macros-in-one-place-use-them-everywhere/">here</a> and said that a program is a better choice, without any evidence to back up that claim. A reader called me on it, so I went and checked. Turns out he's right. Below is how I went about it:</p>
<p>I wrote a do-file, called work.do, that just uses a dataset, nothing more. That data set's name is handled by a local macro, defined in a separate do-file, called locals.do, which work.do includes. Then I wrote locals_program.do which does the same job via a program, which then work_program.do calls by name. Finally, I wrote a profiler.do file that called all four files a few times, and measured the time they all took doing their thing. According to profiler.do, an include is usually faster than a program. Below is the code:</p>
<p><code>
<pre>
// work.do starts here:
include locals.do
use "`my_file'"
</pre>
<p></code><br />
<code>
<pre>
// locals.do starts here
local file_path "C:/work/romanian papers circulation figures/data/"
local file_name "file_combined.dta"
local my_file   "`file_path'`file_name'"
</pre>
<p></code><br />
<code>
<pre>
// this is locals_program.do
capture prog drop defineMyLocals
program defineMyLocals, rclass

local file_path "C:/work/romanian papers circulation figures/data/"
local file_name "file_combined.dta"
local my_file   "`file_path'`file_name'"

local things "file_path file_name my_file"
foreach thing in `things' {
   return local `thing' ``thing''
}

end
</pre>
<p></code><br />
<code>
<pre>
// this is work_program.do
defineMyLocals
local my_file `r(my_file)'
use "`my_file'"
</pre>
<p></code><br />
<code>
<pre>
// and finally, this is profiler.do

set more off

cd "c:/work/programming/putterin/profiler"

// SECTION 1: OVERVIEW
/*
   1. What's going on:
   "program" vs. "include" profiler. this do-file defines
   two programs that will measure the performance difference
   in defining locals separately using two alternate ways
   (1) locals are defined in a do-file called with include
   (2) they are defined in a program called by name

   The exercise is to "use" a file. This file is called via
   a handle defined as a local macro. That definition can be
   either in a do-file included (1) or in a program called
   by name (2).

   2. Programs defined here and their dependencies:
   runProfile1
      defineMyLocals // defined in locals_program.do
   runProfile2
*/

// SECTION 2: GLOBALS

// SECTION 3: PROGRAM DEFINITIONS

// ### programs defined elsewhere and called via "run"
// ### locals_program.do defines the program named
// ### defineMyLocals, which returns a file handle
// ### as an r() local.
run locals_program.do

// ### work_program.do uses a file whose handle
// ### comes from calling defineMyLocals and
// ### retrieving an r() local.
capture prog drop runProfile1
program runProfile1

args counter

local time_start=tc("`c(current_date)'" "`c(current_time)'")
forvalues i=1/`counter' {
	run work_program.do
}
drop _all
local time_end=tc("`c(current_date)'" "`c(current_time)'")
di ""
di "Profile 1 (program), `counter' reps"
di "Time elapsed (ms): "`time_end'-`time_start'

end

// ### work.do includes locals.do, which
// ### defines a file handle as a local.
// ### work.do uses that file by calling
// ### that local.
capture prog drop runProfile2
program runProfile2

args counter

local time_start=tc("`c(current_date)'" "`c(current_time)'")
forvalues i=1/`counter' {
	run work.do
}
drop _all
local time_end=tc("`c(current_date)'" "`c(current_time)'")
di ""
di "Profile 2 (include), `counter' reps"
di "Time elapsed (ms): "`time_end'-`time_start'

end

// SECTION 4: PROGRAM CALLS

local cycles "100 200 500 1000 1500"
foreach cycle in `cycles' {
	forvalues i=1/2 {
   		runProfile`i' `cycle'
	}
}
</pre>
<p></code><br />
And that's it. Put all five files into the same directory, make the profiler <code>cd</code> to it, change the path and name locals to your own data set, and see what you get on your machine. Below is my output:<br />
<code>
<pre>
Profile 1 (program), 100 reps
Time elapsed (ms): 2000

Profile 2 (include), 100 reps
Time elapsed (ms): 2000

Profile 1 (program), 200 reps
Time elapsed (ms): 4000

Profile 2 (include), 200 reps
Time elapsed (ms): 3000

Profile 1 (program), 500 reps
Time elapsed (ms): 9000

Profile 2 (include), 500 reps
Time elapsed (ms): 8000

Profile 1 (program), 1000 reps
Time elapsed (ms): 18000

Profile 2 (include), 1000 reps
Time elapsed (ms): 17000

Profile 1 (program), 1500 reps
Time elapsed (ms): 26000

Profile 2 (include), 1500 reps
Time elapsed (ms): 26000
</pre>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/02/28/program-vs-include-smackdown/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>I switched to Vim</title>
		<link>http://enoriver.net/index.php/2010/02/26/i-switched-to-vim/</link>
		<comments>http://enoriver.net/index.php/2010/02/26/i-switched-to-vim/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 17:08:34 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[syntax]]></category>
		<category><![CDATA[text editors]]></category>
		<category><![CDATA[Vim]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1040</guid>
		<description><![CDATA[I was looking for an excuse to try something new and I decided to pick on one Notepad++ shortcoming that was handy: the Stata syntax highlighting gets utterly mangled after compound quotes -- `"`like so'"' -- which do sometimes arise, usually in the process of file open/file write.  Vim does not get confused by compound quotes [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking for an excuse to try something new and I decided to pick on one <a href="http://notepad-plus.sourceforge.net/uk/site.htm">Notepad++</a> shortcoming that was handy: the Stata syntax highlighting gets utterly mangled after compound quotes -- `"`like so'"' -- which do sometimes arise, usually in the process of file open/file write. </p>
<p><a href="http://www.vim.org/">Vim</a> does not get confused by compound quotes and it comes with Stata syntax highlighting out of the box. Integrating it with Stata is not hard at all, either. There are some general directions by Nick Cox <a href="http://fmwww.bc.edu/repec/bocode/t/textEditors.html#vim">here</a>, but on my Windows XP machine I just had to do two things.</p>
<p>First, I had to edit the PERSONAL/e/editors.ado file to make Vim replace Notepad++ entry in the Editors... sub-menu of the User menu:</p>
<p><code>
<pre>
program define editors
	version 10
	window menu append submenu "stUser" "Editors"
	window menu append item "Editors" "Vim" "winexec gvim.exe"
end
</pre>
<p></code></p>
<p>Next, I wrote a PERSONAL/v/vim.ado file like so:</p>
<p><code>
<pre>
program vim
	version 10
	syntax anything
	winexec gvim.exe `anything'
end
</pre>
<p></code></p>
<p>This is based on Nick Cox's recipe, slightly altered so you can launch Vim from the Stata command line to edit an existing do-file that say has spaces in it, as in <code>vim "my dofile.do"</code>. There's a little more on this "syntax anything" solution in my<a href="http://enoriver.net/index.php/2010/02/23/calling-irregular-arguments-with-syntax-anything/"> previous Stata post</a>.</p>
<p>One thing I did give up -- for now -- is the ability to launch a Stata instance from within Vim after I'm done editing a do-file, either for running the entire do-file or some select lines, the way I used to be able to do in Notepad++. There's got to be a way for that, though. If you know it, please drop me a line.</p>
<p>That said, I'd never knock Notepad++. It is still a fine text editor that has served me well. The people at <a href="http://mathereconomics.com/">Mather Economics</a>, who adopted it a while back at my instigation, are sticking with it -- and I recommend it to anybody who does not insist on a modal editor.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/02/26/i-switched-to-vim/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Calling irregular arguments with syntax anything</title>
		<link>http://enoriver.net/index.php/2010/02/23/calling-irregular-arguments-with-syntax-anything/</link>
		<comments>http://enoriver.net/index.php/2010/02/23/calling-irregular-arguments-with-syntax-anything/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 17:04:39 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[syntax]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=1031</guid>
		<description><![CDATA[The other day I wrote a program that needed to call a file as an argument -- with the full file path. My first pass at it was to capture the argument as usual, with say args input_file. But that would not have worked with file paths that have spaces in them. What might have [...]]]></description>
			<content:encoded><![CDATA[<p>The other day I wrote a program that needed to call a file as an argument -- with the full file path. My first pass at it was to capture the argument as usual, with say <code>args input_file</code>. But that would not have worked with file paths that have spaces in them. What might have worked, I guessed, was to use something like this:<br />
<code>
<pre>
syntax namelist
local input_file `namelist'
</pre>
<p></code><br />
That, however, choked on a Windows-style file path, because the colon in C:\ is an illegal name. Trial and error (and the [I] book) led me to <code>syntax anything</code>. Here's my scratchpad:<br />
<code>
<pre>
capture prog drop myHello
program myHello

syntax namelist
local names `namelist'
di "Hello `names'"

end

myHello Eenie: Meenie
</pre>
<p></code><br />
OK, so "illegal name" it is. Now let's see if "anything" might work better than "namelist":<br />
<code>
<pre>
capture prog drop myHello
program myHello

syntax anything
local names `anything'
di "Hello `names'"

end

myHello c:\my file path here\my file name here.txt
</pre>
<p></code><br />
Success. This may look trivial, but everything is after the fact. We have a long-standing policy at the Eno River Analytics worldwide headquarters that we should write toy programs for sketching things out before we go ahead and actually break something.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2010/02/23/calling-irregular-arguments-with-syntax-anything/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
