Data from Excel to Stata, take two

I don't like spreadsheets. I like data sets and matrices. But I get data in Excel workbooks all the time, and it's a pain in the neck. This will never go away altogether, but if my Stata-using colleagues could be persuaded to quit sending me spreadsheets, that would at least cut down on the volume. I've tinkered with all kinds of ways to make data extraction from spreadsheets into Stata easier. This is the latest:

clear
set mem 10m
set more off
set type double

local here "`c(pwd)'/"

// declare name and date of Excel file to read from
local xlsfile  "revised PriceIncr20080815-TargetTempl.xls"
local filedate "20081110"

// list the worksheets of interest
local sheets ""Move these data" "And these too""

// Extract data from Excel to Stata. Leave this part alone.
local sheetno: list sizeof sheets
forvalues i=1/`sheetno' {
   local k: word `i' of `sheets'
   local my_path     "`here'`xlsfile'"
   local my_sheet    "`k'"
   local source "Excel Files;DBQ=`my_path'"

   clear
   odbc load, dsn("`source'") table("`my_sheet'$")

   unab varz: _all
   foreach z in `varz' {
      local lower=lower("`z'")
      if "`lower'"!="`z'" {
         rename `z' `lower'
      }
   }

   describe
   compress
   save "`here'`k'`filedate'.dta", replace
}

Notice the c(pwd) function. It returns the path to the working directory. Essentially, the do-file above translates the worksheets listed in the local `sheets' into Stata files saved in the same directory as the do-file is in. The original Excel file may be in the same directory, as in this example, or it may be somewhere else. The `filedate' local is optional. Also optional, of course, is the loop that turns variable names to lower case. I just like mine that way.

5 Responses to “Data from Excel to Stata, take two”

  1. Enrico writes:

    Hi,
    i have seen your routine (which) is great and i am trying to use it for an excel file of 1,357 KB size. It contains quite a few sheets and i think this is the reason why when i run the routine Stata gives me the following error message (after have worked successfully just on one sheet. )

    [Microsoft][ODBC Excel Driver] Too many fields defined.

    Do you know how to solve this problem?

    Thank you for your help.

    Regards
    Enrico

  2. Gabi Huiber writes:

    I've never seen that problem, but as it happens I'm in Washington, DC at the Stata conference. I'll ask around and get back to you. Somebody here must have run into this before.

  3. Gabi Huiber writes:

    I never got around to asking anybody else about this. From your message, it looks like it's an ODBC driver error, not a Stata error, so I tried to replicate it.

    I had no luck, but since it's an ODBC drive error, you know that the problem is either in finding the correct workbook, or in finding the correct worksheet. So here's what I would do:

    Change the code below the line that reads "leave this part alone" as follows:

    // first check that Stata found the workbook:
    local my_path "`here'`xlsfile'"
    capture confirm file "`my_path'"
    if _rc==0 {
    di "OK, my Excel file is in the right place."
    }
    else {
    di "I am not looking in the right place."
    }
    // Next, check that I listed the correct tabs:
    di "And these should be my tabs: "
    forvalues i=1/`sheetno' {
    local k: word `i' of `sheets'
    di "Tab `i' should be called `k'. Is it?"
    }

    Running this code should shed some light on what's wrong. Please let me know how this goes.

  4. Dadong writes:

    Hello Gabi:

    Really love your blog. Regularly I have to download some Excel files and load into Stata for data management. I set up odbc connection. But when I try to load it using "odbc load", it give me the following error message:

    The ODBC driver reported the following diagnostics
    [Microsoft][ODBC Excel Driver] External table is not in the expected format.
    SQLSTATE=HY000

    However, if I open these excel files using MS Excel, close it , then use "odbc load" in Stata, there is no problem. Any clue what's wrong?

    Thanks a lot

    Dadong

  5. Gabi Huiber writes:

    Strange. Let me see if I got this right:

    1. You receive a workbook -- .xls or .xlsx, I assume -- from a friend, and "odbc load" is throwing this error.
    2. But you can open that same workbook with your own copy of Excel, so whatever is wrong with it, it's not something Excel cannot handle.
    3. You then close it, and then run "odbc load", and all is well.

    If all of the above is right, my guess is that the problem is outside Stata, and I have no idea how to fix it. There seems to be some kind of Microsoft-specific mismatch between your friend's Excel version (say it's Office 2003) and yours (say it's Office 2007). Your computer's ODBC driver can handle files touched by your own Excel version, but it has a hard time with your friend's.

    If you use Stata 12, you may want to try "import excel" instead of "odbc load".

Leave a Reply