<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Stata Things &#187; code recycling</title>
	<atom:link href="http://enoriver.net/index.php/tag/code-recycling/feed/" rel="self" type="application/rss+xml" />
	<link>http://enoriver.net</link>
	<description>computing for fun and profit</description>
	<lastBuildDate>Wed, 08 Feb 2012 18:09:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Do-file rules &#8212; one suggestion</title>
		<link>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/</link>
		<comments>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 03:05:42 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[code recycling]]></category>
		<category><![CDATA[do-files]]></category>
		<category><![CDATA[global macros]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=446</guid>
		<description><![CDATA[I've gone through several iterations with my idea of best do-file practices, and I'm sure I'll go through some more before I retire. But right now, here's where I stand: My do-files start with a handful of header commands that I found useful at various times. They might look something like this: clear set more [...]]]></description>
			<content:encoded><![CDATA[<p>I've gone through several iterations with my idea of best do-file practices, and I'm sure I'll go through some more before I retire. But right now, here's where I stand:</p>
<p>My do-files start with a handful of header commands that I found useful at various times. They might look something like this:<br />
<code><br />
clear<br />
set more off<br />
set type double<br />
set mem 100m<br />
pause on<br />
</code><br />
This stuff varies a bit occasionally (I may <code>set matsize</code> or specify the <code>version</code>) but it's a bit like a cover sheet, in that it's almost always the same thing. I could have equally well put all of this in <code>profile.do</code>.</p>
<p>Next come the main sections of the do-file:</p>
<p>   Overview<br />
   Globals<br />
   Program definitions<br />
   Program execution</p>
<p>The <em>Overview</em> section is all comments. It's got a structure of its own. It always consists of three parts. The first is just a list of the names of the programs defined in this do-file. The second describes what each program does, in one paragraph per program. The third describes which programs are called explicitly (because some programs on the list can be components of others) and in what order. The first part, the quick list, is useful for when you're fishing for code you might want to recycle. If you give your programs descriptive names, a quick look at the top of any do-file is usually enough to tell you if you're going to find useful stuff there.</p>
<p>The <em>Globals</em> section defines the macros such as file paths that will be used by more than one program. Since local macros are local to programs, anything that you mean to be shared across multiple programs must be a global. Having them all bundled here has another advantage. If you send your do-file to be run on another computer, all your file path changes are made in one place, once.</p>
<p>The <em>Program definitions</em> section does just what you might expect. Programs here can be stand-alone things that you call explicitly in the last section, or can be components, called implicitly. Defining such components is useful when you need to use the same code more than once. If that code is broken, you only need to fix it in one place.</p>
<p>You might want two types of comments with your program definitions. One is a header before the <code>program define</code> line (or, if you're cautious, before the <code>capture program drop</code> line) that tells you at least whether your program takes any arguments, lists them if yes, and tells you a little about each of them. You'd want to know, at a minimum, which ones are string and which are numeric. The other is a set of in-line comments, throughout the program definition, as needed. I find it useful to explain any local macros I declare with a couple of words at least.</p>
<p>The <em>Program execution</em> section does the actual work. It has consequences on disk and on screen.</p>
<p>I settled on this way of writing do-files after I took an online class on C++ at NC State. I treat Stata programs inside a do-file the way one would treat functions inside a .cpp file. My Program execution section is the equivalent of <code>main()</code>. In C++, function declarations (also known as prototypes) are mandatory and go at the top of the source file. My do-file equivalent for those is the Overview section. Except, of course, Overview is not mandatory at all. It consists solely of comments.</p>
<p>Having to submit to this sort of discipline might strike you as negating the benefits of Stata's easygoing nature. After all, if you pined for structure, you'd be programming in SAS, where it's mandatory. </p>
<p>Well, there are two good very good reasons for structure: one is that your code must be portable across your team; another is that it must be readable two weeks later, when you will have forgotten all about it. But I still wouldn't want it imposed upon me by the design of the programming environment. Neither of those very good reasons overrides the importance of on-the-job fun. When you program, you want flow. You need to be free to write up things as they come to you. The programming environment should accommodate that. Only in the tidying-up stage, when your thinking's done and your problem's solved, should you need to worry about structure.</p>
<p>Programming environments that impose structure on you, as opposed to letting you volunteer it, result in beautiful code that takes a long time to write and robs you of most of the pleasure of solving the original problem. The latter might well cause you to do a mediocre job of it. When help like this also costs you more in licensing fees and specialized labor, that's just insult upon injury.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Qualify your commands with capture</title>
		<link>http://enoriver.net/index.php/2008/09/24/qualify-your-commands-with-capture/</link>
		<comments>http://enoriver.net/index.php/2008/09/24/qualify-your-commands-with-capture/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 14:59:43 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[capture]]></category>
		<category><![CDATA[code recycling]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=128</guid>
		<description><![CDATA[Suppose you are assembling a .dta file from disparate pieces of raw data -- an .xls workbook here, a .txt file there -- that must each meet some specific conditions. If you do this in a do-file (as you should, for the sake of reproducibility) you will find it useful to first save each of [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you are assembling a .dta file from disparate pieces of raw data -- an .xls workbook here, a .txt file there -- that must each meet some specific conditions. If you do this in a do-file (as you should, for the sake of reproducibility) you will find it useful to first save each of these components into a <code>tempfile</code> and assemble your tempfiles at the end into the .dta file of interest. Tempfiles reside in memory, and die as soon as your do-file has finished running. That is a very convenient way to avoid hard drive clutter.</p>
<p>Now suppose that your final .dta file does not care if some components are not present -- say because you are building it repeatedly, based on data arriving weekly, and it's OK if some of the components are not present in each week. You would want to make this process a server job (cron job if you're UNIX-minded) but for that you would want to ensure that the do-file will not terminate with an error. </p>
<p>Enter <code>capture</code>. Here's how it works:</p>
<p><code>foreach i in 1 2 {<br />
local tempfile`i'_ok=0<br />
capture confirm file "`tempfile`i''"<br />
if _rc==0 {<br />
local tempfile`i'_ok=1<br />
}<br />
}<br />
</code></p>
<p>So I'm assuming you have two files of interest. The <code>confirm file</code> command, unqualified, can check if a file exists, but if it does not it will exit with an error. The <code>capture</code> qualifier stores that error away, in a sense. It sends it to the _rc predefined macro, which is equal to 0 if the file exists, and some other positive integer if it does not. The point of qualifying <code>confirm file</code> by <code>capture</code> is to keep your do-file going. You can store the state of the _rc in the locals tempfile`i'_ok shown above, and then based on their values you can do various things with your tempfiles, based on whether they exist:</p>
<p><code>if `tempfile1_ok'==1 &amp; `tempfile2_ok'==1 {<br />
use "`tempfile1'"<br />
merge `mergevars' using "`tempfile2'"<br />
tab _merge<br />
keep if _merge==3<br />
drop _merge<br />
}<br />
foreach i in 1/2 {<br />
else if `tempfile`i'_ok==1' {<br />
use "`tempfile`i''"<br />
}<br />
}<br />
save myfinalset, replace<br />
}<br />
else {<br />
display "no tempfiles this week, so moving on"<br />
}<br />
</code></p>
<p>That's it. Now your do-file will run unencumbered. Capture can be used for all sorts of checks.</p>
<p>As an aside, you may have noticed that I merged the two files using the local `mergevars'. It's nice to use local macros for things like combinations of variables, file paths, or file names. Declaring them all at the top of your do-files makes for easier code maintenance and code recycling.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2008/09/24/qualify-your-commands-with-capture/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

