Delayed macro substitution

In Stata, you have local and global macros that can encapsulate all sorts of specific text: names of variables, constant values, even entire chunks of Stata code. Stata will interpret macros as soon as you invoke them, so that if you define


local thedata sysuse auto
local themodel regress mpg foreign weight

you can simply call


`thedata'
`themodel'

You can also chain or nest local macros. It makes for a little extra work and produces code that's a little harder to read, but using macros is the best way to ensure code consistency, so it's a good thing to get used to them.

One neat feature of macros is that you can delay their substitution with backslashes. This allows you to nest macros in a very specific way. Let's expand on the example above. You could have defined the macro `model' as a nested one:


local rhs foreign weight
local model regress mpg `rhs'

From your standpoint, the local `rhs' is nested inside the local `model'. Stata does not care, because as soon as you have it read "`rhs'", it substitutes "foreign weight". From its standpoint, this `model' is identical to what the previous definition would have generated: the string "regress mpg foreign weight". But now suppose that you need a little flexibility in what should go in the right-hand side of this regression equation: suppose that headroom might also matter to fuel economy, presumably because gains in headroom come at a cost in aerodynamics. You could do this:


local rhs1 foreign weight
local rhs2 foreign weight headroom

Then you could do


local model regress mpg `rhs1'
`model'
local model regress mpg `rhs2'
`model'

You have to redefine the local `model' twice because Stata substitutes the values of `rhs1' and `rhs2' as soon as you invoke them. There's a way around that. You could nest `rhs' into the definition of `model' with a delayed, as opposed to immediate substitution, using a backslash, and only change its content when needed:


local model regress mpg \`rhs'
local rhs foreign weight
`model'
local rhs foreign weight headroom
`model'

Delayed substitution is elegant. It lets you nest macros using their names as placeholders, and have Stata fill them in only when they are needed. Here's one final working demo that shows you how you can use delayed macro substitution in program definitions:


capture prog drop myDemo
program myDemo

syntax anything

local displaythis "Argument \`i' is \`addthis'"

local argct: list sizeof anything
forvalues i=1/`argct' {
   local addthis: word `i' of `anything'
   di "`displaythis'"
}

end

myDemo three blind mice

For more information, see here.

14 Responses to “Delayed macro substitution”

  1. Jess writes:

    I just came across this issue YESTERDAY! I gave up and just included repeated lines of code. I've never heard of this solution before. Thanks!

  2. Nick Cox writes:

    Interestingly provocative, but I think wrong-headed.

    This raises, first, questions of taste and style. It's my contention that this kind of coding makes programs unnecessarily difficult to read. If you are the only consumer of your programs, that may not matter so much. If you work collaboratively, it is likely to be a big deal.

    Beyond that, when this has arisen on Statalist, it has also been coupled with a myth that you sometimes need to do this, which I don't think has ever been exemplified in that forum. Often people have just written and were trying to fix or maintain very contorted code. A fresh look at their problem typically simplified the code to remove such constructs.

    The moral: If you don't want to define something now, do it later when you do.

    Your example is not convincing, as the program can be cut down in various ways with exactly the same result. Here is one:

    program myDemo
    syntax anything
    local i = 1
    foreach w of local anything {
    di "Argument `i++' is `w'"
    }
    end

    Of course, if your defence is that the example is just to show how it is done, then OK, but I still prefer my code.

    Please show me an example where this is really needed and then I will change my mind.

  3. Jess writes:

    Nick, I agree that code should be as readable as possible to you and anyone else that might want to use it. However, I do think this has a practical application. For instance, I needed to format 9 different graphs in a uniform format, but with small changes for each graph. Delayed macro substitution made this possible without repeating long lines of code. If I decided to change the format to all of the graphs, it would have been a huge headache without it. Here is an example similar to what I did:

    webuse drug2, clear

    gen over50 = age >= 50

    *Macro for standardized Kaplan-Meier survival curve format
    local KMFORMAT title("Survival Probability by \`TITLE'") xtitle("Months") plotregion(style(none)) ysize(1) yscale(range(0 1)) ylabel(0(.25)1) xsize(1.5) xscale(range(0 40)) xlabel(0(5)40) xtick(0(1)40) legend(\`KEY' region(lwidth(none)) cols(2)) plotopts(lwidth(medthick))

    *Curve 1
    local TITLE Drug
    local KEY label(1 "Placebo") label(2 "Treatment")
    sts graph, by(drug) `KMFORMAT'
    *graph export "...", replace

    *Curve 2
    local TITLE Age
    local KEY label(1 "Under 50") label(2 "Over 50")
    quietly sts graph, by(over50) `KMFORMAT'
    *graph export "...", replace

    I would welcome any solution that would make more sense than the one above.

  4. Nick Cox writes:

    I said this is about taste and style. To my taste and style, this is clearer:

    webuse drug2, clear

    gen over50 = age >= 50

    *Macro for standardized Kaplan-Meier survival curve format
    local KMFORMAT xtitle("Months") xsize(1.5) xscale(range(0 40)) xlabel(0(5)40) xtick(0(1)40)
    ysize(1) yscale(range(0 1)) ylabel(0(.25)1)
    plotregion(style(none)) plotopts(lwidth(medthick))

    *Curve 1
    sts graph, by(drug) `KMFORMAT' title("Survival Probability by Drug")
    legend(label(1 "Placebo") label(2 "Treatment") region(lwidth(none)) cols(2))
    *graph export "...", replace

    *Curve 2
    quietly sts graph, by(over50) `KMFORMAT' title("Survival Probability by Age")
    legend(label(1 "Under 50") label(2 "Over 50") region(lwidth(none)) cols(2))
    *graph export "...", replace

    The principles are simple. Just because you can define a macro doesn't mean that you have to or that it makes code clearer. Your local TITLE is just one word for another. Often that is exactly the thing to use, but often it is just indirection.

  5. Nick Cox writes:

    Let me add that style preferences can vary between writing do-files and Stata programming strong sense.

    What Jess has here is a do-file style in which details of and for particular variables are wired in to the code. (For all I know, this is formalised within a -program-, but that's a choice.) In a do-file I will sometimes use a local to stand for a large chunk of code, as in the KMFORMAT example, or in a -foreach- or -forval- loop when the syntax requires it, but I see a lot of local macro use in do-file style on Statalist in which people are just making their code unnecessarily complicated. People pick up a lot of Stata style by imitation, which is mostly good, as it's a clearly a major way to learn, and me too. But just as I've seen pick up the idea that it is macho to sprinkle your code with macros, I've seen some evidently tempted by the idea that delayed substitution is one of the tricks you need. I don't think it is. If you think your code is clearer that way, and you understand what is being done, you're naturally free to use it, but it's a choice, never (I assert) inescapable as the way to solve a problem.

    Any way, I can't see that it is necessarily bad style to repeat large chunks of code in a do-file if it makes things clearer.

  6. Nick Cox writes:

    An alternative to

    local model regress mpg \`rhs'
    local rhs foreign weight
    `model'
    local rhs foreign weight headroom
    `model'

    is just

    local model "regress mpg"
    `model' foreign weight
    `model' foreign weight headroom

    A slogan sometimes heard: cut out the middle macro.

  7. Gabi Huiber writes:

    Nick, thanks for stopping by. I agree that it's useful to repeat code when that's what it takes for it to be readable. Besides, I never found any use for delayed macro substitution. This particular post has some context. A while back I caught a tweet with a link to some Stata learning resources. Among them was this: http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf. I filed it for future reference and its time came last week. That's when I ran across the trick ("deferred macro evaluation", on page 10) and I found it neat enough to google around for evidence of past use. As it turned out, it was featured in an old Stata FAQ. So I thought there had to be cases where it was useful. I think Jess brought up a perfect example. We all know how useful it is to put aside the common pieces in multiple graph commands in a local definition like KMFORMAT here. But even so the grap commands remain wordy enough to discourage waders. It's nice, then, to have the option of editing the title text on its own line.

  8. Nick Cox writes:

    I can't see what your bottom line is. You don't use it but you think it is useful sometimes?

  9. Gabi Huiber writes:

    That's right. I don't use it, but I don't know enough about all the likely cases to assume it's useless. How I write code is driven both by the job itself and what I know about the toolkit. What I know is a result of things I learned, things I forgot, and things I haven't learned yet. How I learn and forget has a chance component: new jobs come up, and I learn new tricks; of those, some end up remembered because new uses turn up. So, had I known about this particular trick in the past, I would have made use of it. I would have certainly tried it out once, just to give myself a fair chance to remember it.

  10. Nick Cox writes:

    I looked at the example in the course material by Alexander C. Lembcke. Correcting typos that are immaterial to the point here, his example is very similar to one of Gabi's.

    local rhs "gdp60 openk \`add_var’"
    local add_var “kc”
    reg grgdpch `rhs’ if year == 1990
    local add_var “kg”
    reg grgdpch `rhs' if year == 1990

    The last line is implicit. This is said to be as an illustration of using the trick "cleverly". But there is no cleverness worthy of the name. This is just teaching students to write unnecessarily indirect and complicated code:

    local rhs "gdp60 openk"
    reg grgdpch `rhs’ ac if year == 1990
    reg grgdpch `rhs' kg if year == 1990

    will do fine. Of course, I know about using simplified examples to teach principles, but I don't see the need to teach this principle at all from this example.

    I agree with Gabi to this extent: Jess's example is the first I've seen in perhaps a dozen discussions of this device in which there is use in a real and realistic problem and the intent is clearly to make code more manageable, and not to show off a trick or imitate something wrongly thought to be essential. I always write to avoid unnecessary indirection, and that's a personal style choice.

    That's it from me on this one.

  11. daniel klein writes:

    I think it is good to know, that there is such a thing as delayed macro substitution.

    As Nick has demonstrated there might not occur many problems where this "style" is needed (maybe none). I do also agree that delayed substitution makes the code less clear and less easy to read.

    Howerver I can think of an (stylized) example, where I might want to use delayed substitution. Here is a pseudo code:

    prog foo

    syntax varlist [, Missing]

    if "`mising'" != "" {
    loc miss \`v' == . ,m
    }
    else loc miss

    foreach x in long_list {
    forval i = 1/large_number {
    foreach v of loc varlist {
    ta `v' `miss'
    }
    }
    }
    end

    The idea behind delayed substitution here is, that I do not want Stata to check if -missing- is specified for each x of the long_list, each i of the _large_number and each v of the varlist over and over again. Since once specified, -missing- does not change, it is sufficient to check it once. However the content of -miss- changes within the loops (it is deifferent for each variable).

    I do not think that it will slow the program considerably down if I checked the -missing- option over and over -- but I do also think it might not be a bad idea to check it once only.

  12. Nick Cox writes:

    Daniel:

    I am not clear on what this example is supposed to do. Some of the code is schematic, and some of the rest is illegal. Supposing for example that the first variable in varlist were -frog- then the call to -tabulate- would be

    tab frog frog == . , m

    which makes no sense.

    Also your -forval- loop is obscure.

    Given that the program has an option -missing- you can pass that to -tabulate- directly.

    What else do you want to do?

  13. daniel klein writes:

    Nick,

    I am sorry for beeing unclear and (worse) sloppy. There is an -if- qualifier missing.

    Let me correct and try to simplify the code. I hope that makes it easier to catch the idea.

    prog foo

    syntax varlist [, Missing]

    if ("`missing'" != "") {
    loc miss if (\`v' == .) ,m
    }
    else loc miss

    foreach v of loc varlist {
    ta `v' `miss'
    }

    end

    This code tabulates variables in varlist. If -missing- is not specified, only the non-missing observations are -tabulate-d. If option -missing- is specified, only the missing values of the variables are -tabulate-d.

    If we, for the argument's sake, pretend that there is a situation, where I want to do exactly this, then the idea behind delayed substitution is, that the -missing- option is checked once only. Compare with an alternative code

    prog foo2

    syntax varlist [, Missing]

    foreach v of loc varlist {
    if ("`missing'" != "") ta `v' if (`v' == .) ,m
    else ta `v'
    }

    end

    where the -missing- option is checked for every variable over and over again.

    It seems (at least to me) to be a good idea to check if -missing- is specified once only, since if it is, it will be every time the loop executes.

    I thought it might be computational more burdensome to check whether the option is specified each time the loop executes.

    I admit, it is difficult to come up with a "real-life" problem, where one would see the slower execution time (if it is in fact slower to check a condition more than once).

  14. Nick Cox writes:

    I see what you mean. Your foo2 without macro substitution does look clearer to me, and it is certainly shorter. Alternatively, I might program it this way:

    prog foo3
    syntax varlist [, Missing]

    if "`missing'" != "" {
    foreach v of loc varlist {
    ta `v' if (`v' == .) ,m
    }
    }
    else tab1 `varlist'
    end

    If there is a clash, clarity is often more important than brevity so that if necessary I would write slightly longer code to avoid these delayed substitutions.

    In practice, macro substitution is usually pretty fast. If you count up the number of macro evaluations to be made I think it's about the same with both of your solutions (don't forget to count the un-nesting), so the main issue still seems to be readability to humans.