<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Stata Things &#187; arguments</title>
	<atom:link href="http://enoriver.net/index.php/tag/arguments/feed/" rel="self" type="application/rss+xml" />
	<link>http://enoriver.net</link>
	<description>computing for fun and profit</description>
	<lastBuildDate>Mon, 07 May 2012 13:43:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Dummy variables</title>
		<link>http://enoriver.net/index.php/2009/01/19/dummy-variables/</link>
		<comments>http://enoriver.net/index.php/2009/01/19/dummy-variables/#comments</comments>
		<pubDate>Tue, 20 Jan 2009 00:31:29 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[dummy variables]]></category>
		<category><![CDATA[tabulate]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=492</guid>
		<description><![CDATA[There are two straightforward ways to turn string variables into corresponding dummies -- also known as categorical variables -- using Stata. One is an extension of the tab command: tab stringvar, gen(dummy) Another makes use of the fact that you seldom need dummies for their own sake. Usually you want them used in some sort [...]]]></description>
			<content:encoded><![CDATA[<p>There are two straightforward ways to turn string variables into corresponding dummies -- also known as categorical variables -- using Stata. One is an extension of the <code>tab</code> command:<br />
<code><br />
tab stringvar, gen(dummy)<br />
</code><br />
Another makes use of the fact that you seldom need dummies for their own sake. Usually you want them used in some sort of regression model. The <code>xi:</code> extension to various estimation commands turns string variables into dummies automatically, as in<br />
<code><br />
xi: regress y x i.stringvar<br />
</code><br />
Both are described in detail <a href="http://www.stata.com/support/faqs/data/dummy.html">here</a> and they both work well when your string variable translates into dummies directly. That, however, is not always the case. Think of a data set where you have a string variable named "color" which is equal to "red" for the first observation, "blue" for the second and "yellow, blue" for the third. You would want the dummy "color_red" to be equal to 1 in the first observation; the dummy "color_blue" to be 1 in the second and the third; and you'd want a separate dummy, "color_yellow", to be equal to 1 in the third observation.</p>
<p>I just ran across such a data set today. It had characteristics for a few hundred lottery games. The color described the ticket colors. There were a few other string variables that also could have observations that were comma-delimited lists. Moreover, the comma-delimited lists could include values that did not show up as unique values in other observations (like "yellow" in the example above).</p>
<p>So I thought I'd write a program that could deal with all of that without the need of any visual inspection or case-by-case manual labor on my part. I wanted it to be applicable to any string variable in this situation. My suggestion is below:</p>
<p><code> </code></p>
<p><code>// ##### getDummies -- turns string to dummies. Takes one argument:<br />
// #####                      `1' -- string, the name of the variable of interest.<br />
capture prog drop getDummies<br />
prog def getDummies<br />
</code><code><br />
local stringvar `1'<br />
quietly count<br />
local fullset=r(N)<br />
quietly count if !regexm(`stringvar',",")<br />
local uniques=r(N) // cases where `stringvar' is not a list<br />
</code><code><br />
if `fullset'!=`uniques' {<br />
  quietly {<br />
    tab `stringvar' if regexm(`stringvar',",")<br />
    levelsof `stringvar' if !regexm(`stringvar',","), local(tags)<br />
    preserve<br />
    tempfile `stringvar'_lists<br />
    keep `stringvar'<br />
    keep if regexm(`stringvar',",")<br />
    duplicates drop<br />
    split `stringvar', p(",")<br />
    save "``stringvar'_lists'", replace<br />
    restore<br />
    describe `stringvar'* using "``stringvar'_lists'", varlist // note(1)<br />
    local `stringvar'_stubs=r(varlist)<br />
    split `stringvar', p(",")<br />
    local stubs: list sizeof `stringvar'_stubs<br />
    forvalues i=2/`stubs' {<br />
      local stub: word `i' of ``stringvar'_stubs'<br />
      replace `stub'=trim(`stub') // note (2)<br />
      levelsof `stub' if `stub'!="", local(extras)<br />
      local tags: list tags | extras<br />
    }<br />
    local tags: list sort tags<br />
  }<br />
}<br />
else {<br />
  di "for each value of `stringvar' there corresponds one dummy variable"<br />
  capture drop __*<br />
  quietly levelsof `stringvar' if !regexm(`stringvar',","), local(tags)<br />
  local `stringvar'_stubs `stringvar'<br />
}<br />
capture drop __*<br />
</code><code><br />
local stubnum: list sizeof `stringvar'_stubs<br />
local tagnum: list sizeof tags<br />
</code><code><br />
quietly {<br />
  forvalues i=1/`tagnum' {<br />
    local thistag: word `i' of `tags'<br />
    local thistag: list clean thistag<br />
    gen byte _`stringvar'_`i'=0<br />
    forvalues j=1/`stubnum' {<br />
      capture drop __*<br />
      local thisstub: word `j' of ``stringvar'_stubs'<br />
      replace _`stringvar'_`i'=1 if `thisstub'=="`thistag'"<br />
    }<br />
  }<br />
}<br />
drop `stringvar'*<br />
</code><code><br />
// this section is for listing stuff on screen and in the log<br />
local `stringvar'_stubs: list `stringvar'_stubs-stringvar<br />
local stubs: list sizeof `stringvar'_stubs<br />
di ""<br />
di "total number of games: `fullset'"<br />
di "number of games where `stringvar' is not a list: `uniques'"<br />
di "unique values of `stringvar': `tagnum'"<br />
if `stubs'&gt;0 {<br />
  di "where `stringvar' is a list, it is this long at most: `stubs'"<br />
}<br />
di ""<br />
forvalues i=1/`tagnum' {<br />
  local thistag: word `i' of `tags'<br />
  local thistag: list clean thistag<br />
  di "_`stringvar'_`i' is for `stringvar' == `thistag'"<br />
}<br />
</code><code><br />
end<br />
</code></p>
<p>That's it. This program collects all the possible values that your stringvar can take, whether inside comma-delimited lists or by themselves, and produces accurate dummies that are equal to one every time such a value is encountered, whether by itself or in a list, and regardless of its position in the list. With your data set in memory, you simply call<br />
<code><br />
getDummies color<br />
</code><br />
Now, I don't post programs unless they contain something I just learned in the process of writing them. Today's such thing is in line 24, next to the comment "note(1)". Turns out -- if you call <code>help describe</code> -- there are two kinds of <code>describe</code>: one for data in memory, another for data <code>using</code> a file. The latter comes with a different set of options. One of them is <code>varlist</code>. It stores the name of the variables in <code>r(varlist)</code>. I chose to preserve/restore and create the tiny temporary file "``stringvar'_lists'" so I could apply <code>describe using</code> to it, and get a variable list saved in the local ``stringvar'_stubs'. I'm using it later on.</p>
<p>This may look like a lot of work, and it is, but it's all up-front. You do it once, and if it works now it works forever. The marginal cost of creating dummies out of any number of such variables is zero from here on out.</p>
<p>Update (February 4, 2009): the first version of getDummies had a bug. The line marked with the comment "note(2)" was missing. As a result, the program produced more dummies than it should have.  To use my color example, without this line getDummies will produce two separate dummies for the color "blue": one for the case where color was equal to "blue" strictly, and another for the case where color contained the string " blue". Notice the leading blank space. You want to <code>trim()</code> it.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2009/01/19/dummy-variables/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Arguments</title>
		<link>http://enoriver.net/index.php/2008/11/26/arguments/</link>
		<comments>http://enoriver.net/index.php/2008/11/26/arguments/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 06:04:09 +0000</pubDate>
		<dc:creator>Gabi Huiber</dc:creator>
				<category><![CDATA[Stata]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[program]]></category>

		<guid isPermaLink="false">http://enoriver.net/?p=321</guid>
		<description><![CDATA[In an older post I wrote about the program capability in Stata. One thing I didn't say and I find increasingly useful these days is that you can pass arguments to programs. Packaging a set of commands inside a program, to be read once and then invoked as many times as they are needed, is [...]]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://enoriver.net/index.php/2008/09/30/soup-up-your-do-files-codeprogramcode/">older post</a> I wrote about the <code>program</code> capability in Stata. One thing I didn't say and I find increasingly useful these days is that you can pass arguments to programs.</p>
<p>Packaging a set of commands inside a program, to be read once and then invoked as many times as they are needed, is nice enough. But what if you need to make those instructions apply to different variables in different instances?</p>
<p>For example, I have a set of parameter estimates for a survival model. I use them to calculate the lambda part of the hazard function under various scenarios for one variable of interest. You can do this immediately after estimating your hazard model, of course, using <code>predict</code> with appropriate modifiers, but you don't want to re-estimate the model every time you need to use the parameters. Besides, you may have to use one data set for estimating the model (say historical data) and other data sets (say current data) for making the predictions of interest.</p>
<p>So let's say that you have your parameters saved in a matrix, and declare a program, called e.g. getLambda, to pair them with the corresponding regressors as many times as you need to. But every time you call getLambda, instead of the first regressor you need to use a variable that is some sort of transformation of it, and you saved that variable with a different name to make it absolutely clear that this is not the actual regressor. It would be nice if you could pass that variable as an argument to getLambda, in effect telling Stata "I want to calculate lambda with this particular variable at the front of the regressor list this time".</p>
<p>The program getLambda might look something like this:</p>
<p><code><br />
capture prog drop getLambda<br />
prog def getLambda</code></p>
<p><code> </code></p>
<p><code>local thiscase `1'<br />
local myregressors "`thiscase' var1 var2 var3 1"<br />
local regressorcount: list sizeof myregressors<br />
</code><code><br />
forvalues i=1/`regressorcount' {<br />
   local thisreg: word `i' of `myregressors'<br />
   gen param_`i'=beta[`i',1]*`thisreg'<br />
}<br />
egen lambda_`thiscase'=rsum(param_1-param_`regressorcount')<br />
drop p_*<br />
</code><code><br />
end<br />
</code><br />
I forced the last element of `myregressors' to be 1 because I am assuming that the constant is the last parameter saved in your parameter matrix. Notice the local `1' in the first line after the program definition. It is Stata's name for the first argument of your program. In our case, that argument is simply the name of the variable that you want in the first position in the regressor list. So the call to getLambda would be</p>
<p><code><br />
getLambda foo<br />
</code><br />
This would make Stata calculate the lambda with the variable foo at the front of the regressor list. If later you want the variable bar there instead, you just call<br />
<code><br />
getLambda bar<br />
</code><br />
Arguments can be strings, numbers, whatever. As long as your function is strictly for internal use and you know what you're using it for, you don't need to program the fancy stuff -- like error codes or help files. That, at least, is what I found. When I get to the point where I need those extras, I'll learn how to do them and then blog about it here.</p>
<p>You may have noticed the apparently superfluous use of the <code>list sizeof</code> extended macro function. After all, it's plain enough that the word count of the "myregressors" local is 4. Well, sometimes you have more regressors than that. Do you care to count them by hand? I don't.</p>
<p>There are all sorts of very good reasons for leaving to Stata as much of the grunt work as possible. The most obvious of them is that code written this way is more portable across different projects. But the honest truth is that I really like those list functions. I've been finding all sorts of uses for them lately. I am going to have a blog post dedicated to them sometime soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://enoriver.net/index.php/2008/11/26/arguments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

