<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Stata Things &#187; do-files</title>
	<atom:link href="http://enoriver.net/index.php/tag/do-files/feed/" rel="self" type="application/rss+xml" />
	<link>http://enoriver.net</link>
	<description>computing for fun and profit</description>
	<lastBuildDate>Wed, 08 Feb 2012 18:09:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Consider ado-files</title>
		<link>http://enoriver.net/index.php/2009/03/19/consider-ado-files/</link>
		<comments>http://enoriver.net/index.php/2009/03/19/consider-ado-files/#comments</comments>
		<pubDate>Thu, 19 Mar 2009 20:12:09 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[do-files]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=631</guid>
		<description><![CDATA[A while ago I suggested a particular do-file architecture that seemed to work well for me at the time. The post is here. That architecture still works fine, but an obvious improvement suggested itself since I proposed it. I've been finding that some jobs that I had encapsulated in programs are so ubiquitous that I [...]]]></description>
			<content:encoded><![CDATA[<p>A while ago I suggested a particular do-file architecture that seemed to work well for me at the time. The post is <a href="http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/">here</a>.</p>
<p>That architecture still works fine, but an obvious improvement suggested itself since I proposed it. I've been finding that some jobs that I had encapsulated in programs are so ubiquitous that I could have just as well saved them as ado files. I had resisted doing that, because I tend to treat ado-files with a bit of reverence. Code, I think, is ado-file worthy if it is general enough and has proper online help.</p>
<p>Maybe I should ease up on that rule though. Maybe code that meets those requirements is actually package-worthy, ready to be distributed to the rest of the world, while the standards for ado-files should be more relaxed.</p>
<p>As long as your code does a job that's general within the scope of a given job, maybe it could go into an ado file that is available to that job and not others. There is an easy way to do that. Typing <code>adopath</code> in the Stata command line will list all the places where Stata will look for code for any commands you throw at it. Ado-files that you write go into the PERSONAL folder by default, but you can send them anywhere, as long as you tell Stata where to look for them.</p>
<p>Time for an example. I am working on a project where some do-files are built automatically (this is seldom necessary, by the way). Stata has some internal system limits (type <code>help limits</code> to see them all) and one of them is the restriction that a program cannot have more than 3,500 lines. This is very seldom a binding restriction. In fact, it should never even be an issue. But let's say you're stuck with legacy ways of writing code and adding to existing do-files, and you might run into this limit.</p>
<p>In such cases it may be interesting to have a routine for counting the lines in any do-file. Since that is done in the same way regardless of the content of the do-file, this job lends itself to being packaged into an ado-file. That ado file might look like this:<br />
<code><br />
// counts lines in an ascii file (.do, .txt, .csv, etc)<br />
// one argument: file name with path as needed<br />
// (use full path for safety; spaces are OK)<br />
capture prog drop lineCount<br />
program lineCount, rclass<br />
</code><code><br />
version 9.2<br />
local filename `0'<br />
tempname fh<br />
local linenum=0<br />
file open `fh' using "`filename'", read<br />
file read `fh' line<br />
while r(eof)==0 {<br />
  local linenum=`linenum'+1<br />
  file read `fh' line<br />
}<br />
file close `fh'<br />
return local count `linenum'<br />
end<br />
</code><br />
So now, in your current do-file, you can do things such as<br />
<code><br />
foreach client in `clients' {<br />
  local this_client_file "`my_file_path'`client'_data_cleaning.do"<br />
  lineCount `this_client_file'<br />
  display "lines in `client'_data_cleaning.do: " `r(count)'<br />
}<br />
</code><br />
Now, under the old system, lineCount would have been a program defined in the Section 3 of either the current do-file or another do-file called by this one. But if I'm going to use it all the time and it never changes, making it a stand-alone ado-file instead makes sense. There are two steps for that. </p>
<p>First, in the current project folder (let's say I defined is as the local `project_root') I set up a sub-folder called ado. Next, I set up a sub-folder called simply l, as in the first letter of lineCount. This may be overkill, but I like Stata's idea of grouping ado-files into subfolders by first letter. It's cleaner. Next, I save the program above into `project_root'/ado/l/ with the name lineCount.ado.</p>
<p>Second, I need to let Stata know where to look for it. That is as simple as adding<br />
<code><br />
adopath + "`project_root'/ado"<br />
</code><br />
at the top of my current do-file.</p>
<p>As time goes by and I find more jobs that could be handled this way, I can set aside their programs and save them as ado-files into sub-folders starting with their first letters. Stata only needs you to point it in the right direction. If it doesn't find your command in the ado folder, it will look for a sub-folder named after the first letter of your command.</p>
<p>Ado-files will keep my do-files less cluttered and will make it easier to both recycle old code and debug new one.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/03/19/consider-ado-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Human-readable code</title>
		<link>http://enoriver.net/index.php/2009/02/06/human-readable-code/</link>
		<comments>http://enoriver.net/index.php/2009/02/06/human-readable-code/#comments</comments>
		<pubDate>Sat, 07 Feb 2009 02:05:07 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[do-files]]></category>
		<category><![CDATA[encapsulation]]></category>
		<category><![CDATA[program]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=531</guid>
		<description><![CDATA[I just made a couple of changes to this theme's style sheet. I wanted a slightly wider page in order to accommodate longer lines of code. I needed it because some code lines in my Dummy variables post ran over when rendered in IE and Opera. If you cut and pasted the code, errant end-of-line [...]]]></description>
			<content:encoded><![CDATA[<p>I just made a couple of changes to this theme's style sheet. I wanted a slightly wider page in order to accommodate longer lines of code. I needed it because some code lines in my <a href="http://enoriver.net/index.php/2009/01/19/dummy-variables/">Dummy variables</a> post ran over when rendered in IE and Opera. If you cut and pasted the code, errant end-of-line characters could have crept in. Cutting and pasting code you find here is encouraged, by the way. It's supposed to be ready to use.</p>
<p>So, OK, you can widen the page, but should you? People read comfortably text that's about four inches wide, I would guess. Computers don't care, but code is meant to be read by people. The text you're reading used to sit in a box 540 pixels wide. The current width is 600 pixels. The difference was enough to make the problem go away. It would be nice to know that it's also not big enough to harm your reading convenience.</p>
<p>If you tell me that it is, I will revert to a narrower column, and start using all sorts of tricks to make Stata code fit inside it. Easiest, of course, would be to set <code>#delimit ;</code> . Then you don't care about end-of-line characters, because Stata will ignore them. But what if you're like me and you prefer <code>#delimit cr</code>?</p>
<p>You can use local macros as placeholders for whatever would otherwise result in long lines of code, as in<br />
<code><br />
local 1 The quick brown fox<br />
local 2 jumped over the lazy dog.<br />
di "`1' `2'"<br />
</code><br />
That's not too bad. I have used this workaround before, but now I'll make it an explicit policy: I will trade width for length in my code in future entries.</p>
<p>Maybe this should be a general standard of writing code. It might encourage diligent encapsulation. You don't want very long do-files, or at least you don't want very long programs, for all sorts of good reasons. So you already do spread large projects across multiple do-files, multiple programs or both, if you're really into that sort of thing. If you also force yourself to have short lines, your code will be all the better for it. What do you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/02/06/human-readable-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Do-file rules &#8212; one suggestion</title>
		<link>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/</link>
		<comments>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 03:05:42 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[code recycling]]></category>
		<category><![CDATA[do-files]]></category>
		<category><![CDATA[global macros]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=446</guid>
		<description><![CDATA[I've gone through several iterations with my idea of best do-file practices, and I'm sure I'll go through some more before I retire. But right now, here's where I stand: My do-files start with a handful of header commands that I found useful at various times. They might look something like this: clear set more [...]]]></description>
			<content:encoded><![CDATA[<p>I've gone through several iterations with my idea of best do-file practices, and I'm sure I'll go through some more before I retire. But right now, here's where I stand:</p>
<p>My do-files start with a handful of header commands that I found useful at various times. They might look something like this:<br />
<code><br />
clear<br />
set more off<br />
set type double<br />
set mem 100m<br />
pause on<br />
</code><br />
This stuff varies a bit occasionally (I may <code>set matsize</code> or specify the <code>version</code>) but it's a bit like a cover sheet, in that it's almost always the same thing. I could have equally well put all of this in <code>profile.do</code>.</p>
<p>Next come the main sections of the do-file:</p>
<p>   Overview<br />
   Globals<br />
   Program definitions<br />
   Program execution</p>
<p>The <em>Overview</em> section is all comments. It's got a structure of its own. It always consists of three parts. The first is just a list of the names of the programs defined in this do-file. The second describes what each program does, in one paragraph per program. The third describes which programs are called explicitly (because some programs on the list can be components of others) and in what order. The first part, the quick list, is useful for when you're fishing for code you might want to recycle. If you give your programs descriptive names, a quick look at the top of any do-file is usually enough to tell you if you're going to find useful stuff there.</p>
<p>The <em>Globals</em> section defines the macros such as file paths that will be used by more than one program. Since local macros are local to programs, anything that you mean to be shared across multiple programs must be a global. Having them all bundled here has another advantage. If you send your do-file to be run on another computer, all your file path changes are made in one place, once.</p>
<p>The <em>Program definitions</em> section does just what you might expect. Programs here can be stand-alone things that you call explicitly in the last section, or can be components, called implicitly. Defining such components is useful when you need to use the same code more than once. If that code is broken, you only need to fix it in one place.</p>
<p>You might want two types of comments with your program definitions. One is a header before the <code>program define</code> line (or, if you're cautious, before the <code>capture program drop</code> line) that tells you at least whether your program takes any arguments, lists them if yes, and tells you a little about each of them. You'd want to know, at a minimum, which ones are string and which are numeric. The other is a set of in-line comments, throughout the program definition, as needed. I find it useful to explain any local macros I declare with a couple of words at least.</p>
<p>The <em>Program execution</em> section does the actual work. It has consequences on disk and on screen.</p>
<p>I settled on this way of writing do-files after I took an online class on C++ at NC State. I treat Stata programs inside a do-file the way one would treat functions inside a .cpp file. My Program execution section is the equivalent of <code>main()</code>. In C++, function declarations (also known as prototypes) are mandatory and go at the top of the source file. My do-file equivalent for those is the Overview section. Except, of course, Overview is not mandatory at all. It consists solely of comments.</p>
<p>Having to submit to this sort of discipline might strike you as negating the benefits of Stata's easygoing nature. After all, if you pined for structure, you'd be programming in SAS, where it's mandatory. </p>
<p>Well, there are two good very good reasons for structure: one is that your code must be portable across your team; another is that it must be readable two weeks later, when you will have forgotten all about it. But I still wouldn't want it imposed upon me by the design of the programming environment. Neither of those very good reasons overrides the importance of on-the-job fun. When you program, you want flow. You need to be free to write up things as they come to you. The programming environment should accommodate that. Only in the tidying-up stage, when your thinking's done and your problem's solved, should you need to worry about structure.</p>
<p>Programming environments that impose structure on you, as opposed to letting you volunteer it, result in beautiful code that takes a long time to write and robs you of most of the pleasure of solving the original problem. The latter might well cause you to do a mediocre job of it. When help like this also costs you more in licensing fees and specialized labor, that's just insult upon injury.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/01/03/do-file-rules-one-suggestion/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Do-file rules &#8212; a justification</title>
		<link>http://enoriver.net/index.php/2009/01/03/do-file-rules-a-justification/</link>
		<comments>http://enoriver.net/index.php/2009/01/03/do-file-rules-a-justification/#comments</comments>
		<pubDate>Sat, 03 Jan 2009 06:54:21 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[do-files]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=421</guid>
		<description><![CDATA[There's apparently a programming principle going by the acronym DRY -- Don't Repeat Yourself. Somebody needs to teach it to Stata users. See, Stata is so flexible and easy to use that it's perverse in a way. Take for example the ability to generate new variables on the fly, either in interactive mode or wherever [...]]]></description>
			<content:encoded><![CDATA[<p>There's apparently a programming principle going by the acronym DRY -- Don't Repeat Yourself. Somebody needs to teach it to Stata users.</p>
<p>See, Stata is so flexible and easy to use that it's perverse in a way. Take for example the ability to generate new variables on the fly, either in interactive mode or wherever in your do-file fancy strikes. You would, of course, use Stata's <code>generate</code> command, or your favorite short version of it, as in<br />
<code><br />
gen my_new_variable<br />
</code><br />
You couldn't do this in SAS. First, there's no such thing as interactive mode, and second, you'd have to be in the data step. SAS will impose a modular structure on your code, and if that's not your style, too bad. You'll find it awkward but you'll either have to live with it, or hire a SAS programmer. There is, by the way, such a job title. I just put "SAS programmer" in the search box at <a href="http://www.simplyhired.com">simplyhired.com</a> and got 518 hits. And what did "Stata programmer" get me? This:</p>
<blockquote><p><strong><span style="color: #808080; font-size: 1.5em">Dang. We didn't find anything for you.</span></strong><span style="color: #808080; font-size: .8em"><br />
We couldn't find any jobs for "Stata programmer".<br />
You're probably a good speller, but check the keyword terms you entered.<br />
You can also try using some other keywords, or enter fewer words to expand your search.<br />
It's also possible we made an error somewhere.<br />
Sometimes computers are human too... just shinier.</span></p></blockquote>
<p>This might look like a digression, but it's not. Where I was going is this: robust, friendly, and well-designed tools like Stata make programmers out of all of us. That's the upside. The downside is that while we might be fine social scientists, or epidemiologists, or what have you, we'll be almost sure to be pretty lousy programmers, because ease of use breeds incompetence.</p>
<p>Back to DRY: if you can do anything at any point in your do-file, you will not resist repeating yourself, because that's the most expedient thing to do sometimes. Worse, you will do the same thing but with slight variations, like spelling <code>2</code> in one place and <code>`two'</code> in another. That will produce ugly, unmaintainable code.</p>
<p>Yet DRY is as feasible in Stata as in the least user-friendly of its alternatives (I'm not saying that's SAS, by the way; machine code might be worse). You just need to set up and keep some sensible rules of your own for what's acceptable when writing a do-file. I will follow up with some suggestions over the next couple of days.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/01/03/do-file-rules-a-justification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Soup up your do-files: program</title>
		<link>http://enoriver.net/index.php/2008/09/30/soup-up-your-do-files-codeprogramcode/</link>
		<comments>http://enoriver.net/index.php/2008/09/30/soup-up-your-do-files-codeprogramcode/#comments</comments>
		<pubDate>Tue, 30 Sep 2008 16:25:51 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[do-files]]></category>
		<category><![CDATA[program]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=157</guid>
		<description><![CDATA[Most Stata commands are programs written in Stata's ado-file language; they are saved as .ado files that you are free to browse. For example, on my Windows XP machine the guts of the simple describe command are here: C:\Program Files\Stata10\ado\base\d\describe.ado. Stata will let you write your own ado-files and treat them as first-class citizens of [...]]]></description>
			<content:encoded><![CDATA[<p>Most Stata commands are programs written in Stata's ado-file language; they are saved as .ado files that you are free to browse. For example, on my Windows XP machine the guts of the simple <code>describe</code> command are here: C:\Program Files\Stata10\ado\base\d\describe.ado.</p>
<p>Stata will let you write your own ado-files and treat them as first-class citizens of your Stata install. But: (1) Stata is a very accomplished piece of software right out of the box, (2) it is updated scrupulously and as often as needed and (3) there are already thousands of user-created commands tried and true and ready to <code>net install</code>. So, it rarely warrants that sort of effort. Though this general assurance might not deter you from giving it a try, chances are that you're not an ado-file writer. You use do-files instead.</p>
<p>Do-files can get unwieldy with complicated projects. Yet the more complicated the job, the more useful your do-files are, because they organize your work and maintain your paper-trail. There is no reproducible research without do-files. This is so important that the fact that you can work with Stata interactively is sometimes argued to be too much of a good thing.</p>
<p>So you need do-files and you don't want them to be too big. One way to make that compromise is to use hierarchical do-files. A very short master do-file calls the others, each of which is just a list of commands that could have just as well been all in one file. Your master do-file might look something like this:</p>
<p><code>clear<br />
set more off<br />
pause on<br />
</code><code><br />
cd c:/data/myproject<br />
do collect_data.do   // insheets data from source text files<br />
do estimate_model.do // runs my clever ML routine<br />
do write_report.do   // that's right, from Stata to LaTeX<br />
</code></p>
<p>This would do the job, and inline comments as shown above can add intuitive appeal beyond what you get from simply using good do-file names. But if you need to run a do-file inside a loop, there is a certain cost to reading every single line of Stata code every time it needs to be executed. It would be nice if you could read it in once, and invoke it as needed. That's the use of <code>program</code>.</p>
<p>Usually, do-files are just text files full of Stata commands. Stata reads, interprets and executes these commands one by one. But any do-file can be turned into a program with three extra lines as follows:</p>
<p><code>capture program drop myProgram // this is optional, but you want it<br />
program define myProgram<br />
</code><code><br />
[...] // your old do-file goes here<br />
</code><code><br />
end<br />
</code></p>
<p>This doesn't mean that all do-files should get the <code>program</code> treatment. But suppose that you have a do-file called clean_my_data.do and you use it for preparing 20 source files in the same way -- generate new variables, turn dates from yy/mm/dd into Stata's elapsed time format, etc. Without these three extra lines, your master do-file would call it like so:</p>
<p><code>forvalues i=1/20 {<br />
drop _all<br />
use mysourcefile`i'<br />
do clean_my_data.do<br />
}<br />
</code></p>
<p>That's fine, but it will be a bit slow. Every line in clean_my_data.do is read afresh in every cycle, output to screen, then executed, its result is output to screen, and so on. Declaring this do-file as a <code>program</code> allows Stata to read it once, keep it in mind, and execute it as many times as needed. You will, of course, choose some descriptive name for your program, say cleanMyData. Your master do-file will look like this:</p>
<p><code>do clean_my_data.do // Stata reads and memorizes your do-file<br />
</code><code><br />
forvalues i=1/20 {<br />
drop _all<br />
use mysourcefile`i'<br />
cleanMyData                       // Stata executes the commands inside your do-file<br />
}<br />
</code></p>
<p>There is another advantage to declaring do-files as programs: clean logs. If you have Stata read the commands first and run them all at the same time from memory, then your logs will only contain the output of those commands, not the commands themselves. That will keep them small and readable.</p>
<p>The drawback is that programs are sticky: Stata memorizes them until it is either explicitly told to forget them, or is shut down. In the same instance of Stata you might possibly run two very different do-files that call two different programs declared at different times in the past under the same name. If in this instance of Stata you only read in one of them, it will be executed twice: once in its proper context, and another time in a totally wrong one. Stata will think that that's what you mean to do.</p>
<p>The <code>capture prog drop myProgram</code> line might help a bit. Obviously you can also <code>capture prog drop _all</code>, with the caveat that this might drop programs that you don't want dropped. Use a blend of the two variants as circumstances warrant. Stata can live with either. The best thing to do, of course, is to use very descriptive names for programs. They usually have the side effect of being specific, which will avoid such ambiguity in the first place.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2008/09/30/soup-up-your-do-files-codeprogramcode/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Edit Stata do-files with Notepad</title>
		<link>http://enoriver.net/index.php/2008/09/03/edit-stata-do-files-with-notepad/</link>
		<comments>http://enoriver.net/index.php/2008/09/03/edit-stata-do-files-with-notepad/#comments</comments>
		<pubDate>Thu, 04 Sep 2008 03:14:30 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[do-files]]></category>
		<category><![CDATA[text editors]]></category>

		<guid isPermaLink="false">http://host1.tld/?p=30</guid>
		<description><![CDATA[If you right-click on a do-file under Windows XP, you can either open it or edit it. Opening it means that Stata will launch and attempt to execute it. The editing, by default, will also trigger a Stata launch, under the assumption that you want to edit the do-file inside Stata's own do-file editor. That [...]]]></description>
			<content:encoded><![CDATA[<p>If you right-click on a do-file under Windows XP, you can either open it or edit it. Opening it means that Stata will launch and attempt to execute it. The editing, by default, will also trigger a Stata launch, under the assumption that you want to edit the do-file inside Stata's own do-file editor. That may not be your best choice. Notepad, for example, is lightweight. There is little reason to deploy all the Stata gear -- GUI, check for updates, maybe a launch of your profile.do that automatically allocates 1.5G of your RAM to Stata -- for a simple editing job of what is essentially a text file.</p>
<p>You can set Notepad as your default do-file editor as follows: open Windows Explorer, say My Documents. The last item under the Tools menu is "Folder Options..." In the File Types tab there is a list of all the file extensions that your Windows XP installation knows of. Highlight the .do extension and click the Advanced button. There are two actions you can take with do-files: open and edit. Either can be edited. I know it's ambiguous, but when you're editing "edit" you're simply telling Windows what program to launch for editing. By default, Stata's installation instructs Windows to open your w[...]stata.exe and orders it to "doedit %1". You can change the Edit entry to "C:\WINDOWS\notepad.exe" "%1". That tells Windows that you want to edit your do-files with Notepad. Click "OK" and you're all set.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2008/09/03/edit-stata-do-files-with-notepad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

