How many zeroes in that Poisson?

I have a data set, and some of the variables there are counts of a given event.

Four count outcomes, the easiest thing to do is a Poisson regression, but before you do that, it's worth asking if what you see there really is close enough to a Poisson process.

You could check whether the variance is more or less equal to the mean, but with real-life data you can bet that there will be a difference between the two, and you'll be left scratching your head as to whether it's too big for Poisson, or just about right to pass the smell test.

Another thing you can do is check whether the count variable shows the right number of zeroes. In a Poisson distribution, the marginal probability of a zero outcome is exp(-mean). If the proportion of zeroes that you see is a lot higher than this value, and it usually is when you're looking at counts of rare events, then you will have to consider a zero-inflated poisson or a finite-mixture model, as discussed with wonderful clarity in chapter 17 of Microeconometrics using Stata by Cameron and Trivedi.

That brings me to the immediate cause for this post. I thought I'd code up a quick program to check those zeroes for a given data set and count variable, and I did this:


capture prog drop checkZifPoisson
program checkZifPoisson

version 12

args y dataset

local f _col(60) %5.2fc

di ""
useData `dataset' // nevermind how this is coded up
di ""
di "Checking `dataset':"
qui {
   count
   local den r(N)
   replace `y'=0 if missing(`y')
   count if `y'==0
   local num r(N)
   sum `y'
   local zpois exp(-`r(mean)')
   local zobs `num'/`den'
}
di "Share of `y'=0 in a Poisson process: " `f' `zpois'
di "Share of `y'=0 observed: " `f' `zobs' 

end

Then I ran the thing, and it kept turning up a share of 1.0, that is 100% zeroes observed, no matter the data set or the variable of interest y. You know why? Because local den r(N) will evaluate to r(N), and that will be filled in by the last command that returns such a thing before `den' is invoked. That command is sum `y'. The same thing happens to `num'. So I took the ratio of the same number. The returned values from the calls to count that I had made right before defining both local num and local den were quietly obliterated. Isn't that a sneaky bug? The correct code is below:


capture prog drop checkZifPoisson
program checkZifPoisson

version 12

args y dataset

local f _col(60) %5.2fc

di ""
useData `dataset' // nevermind how this is coded up
di ""
di "Checking `dataset':"
qui {
   count
   local den `r(N)'
   replace `y'=0 if missing(`y')
   count if `y'==0
   local num `r(N)'
   sum `y'
   local zpois exp(-`r(mean)')
   local zobs `num'/`den'
}
di "Share of `y'=0 in a Poisson process: " `f' `zpois'
di "Share of `y'=0 observed: " `f' `zobs' 

end

Now `den' stores the actual value returned by the count before it was defined as local den `r(N)', as intended. Mind your apostrophes, is all I'm saying.

Mapping Durham

Today, Kirstin wanted to make a grocery trip to the Whole Foods at Bull City Market, then take Kate to the nearest playground. That seems to be Oval Drive Park, but it won't be obvious from querying the Durham Park Locator .

No worries. The Durham Park Locator gives you a pretty nice table with all 55 playgrounds as of today. You load it into Stata, hit it with geocode and writekml, and you get this Google map. Easy.

On this occasion I also discovered that my writekml, as submitted, had a tiny bug. I submitted the fix a few minutes ago. It should be up by your next update ado.

On work

Last week Ezra Klein wrote this. His main point, the way I read it, is that the Obama stimulus was insufficient. But he makes a few other claims, one of which is that the government should have stopped firing people and it should have even hired more, even in totally bogus make-work jobs -- one assistant to every park ranger, three trainees to every firefighter, etc. He reasons that in drawn-out recessions the labor market is not quick enough to reassign people to productive jobs. Instead, people stay unemployed until their skills erode, which makes them even less employable, which depresses demand further.

Skill erosion is a legitimate worry, but I don't think Ezra Klein gets it right. Drawing a paycheck is not the only kind of productive work. Whether employed or not, as long as we are in the workforce we always do a combination of learning and paid work. We change the mix as we go along according to the going price of our time at the moment. When we are out of a job, our time is cheap. This ensures the best possible return on learning. Basically, when we're not busy drawing a paycheck, we're busy growing our stock of human capital.

A make-work job increases the going rate of the beneficiary's time, which makes learning more expensive. The rational response is to learn less. If most beneficiaries of make-work jobs respond rationally, these jobs will cause skill erosion, not cure it.

That may not be all. Your stock of human capital isn't only good for making you ready to make a living when employers start hiring again. That is only its more obvious use. The less obvious one is this: the more you know, the more you know what makes you happy. That improves your odds of living a good life. It may be one of comfort, adventure, contemplation or service. Whatever it is, I would not be surprised if getting there took at least some learning: think of it as serendipity with a little help from you.

So it seems to me that make-work jobs only reduce the growth rate of human knowledge both about things that we can do for profit and about things that make us happy. This makes them a good way to keep us in the doldrums -- both in our heads and in our wallets.

 

From Stata to Google Maps

At the Stata command line, type "findit geocode". You will turn up a command that matches physical addresses with latitude and longitude coordinates using the Google Maps API.

Then if you type "findit writekml" you will turn up my first contribution to the SSC: a command that writes a KML file using latitude and longitude coordinates in a Stata data set. Enjoy.

Stata 12 with MacVim

I used to run Stata 10 with Vim on Windows. Now I run Stata 12 with MacVim.

In Windows, there is a nice way to integrate Stata and Vim based on the work of Friedrich Huebler and Dimitriy Masterov.

A fairly straightforward combination of bash scripts, Vim functions and Applescript calls can achieve the same behavior in MacVim. I got it to work last night. Thank you, Phil.

How I rooted my Nook Color

The job takes one microSD card and two files that you download from the Internet: a CWR .img file, that you use to make the microSD card bootable; and a ManualNooter .zip file, which you do not unzip; you simply copy it onto your bootable microSD.

You will google this, and instructions of different vintage will reference different version numbers for these two files. As of this writing, for a Nook Color that came in a box with a blue dot on it, you need the 3.2.0.1 (eyeballer) version of the CWR, and the 4.6.16 version of the ManualNooter. Further instructions are here and here.

I used the 1G microSD card from my decommissioned BlackBerry Pearl, an embarrassment of a smartphone if there ever was one. The card is over three years old and it's class 4, but it worked. You will google this, and you will read that it's best if the card is class 6 or better. I'm sure that's right.

I used a Mac to make this card bootable. There are instructions for this all over Google. I liked best the ones here. They came with a screenshot of the terminal.

The Mac terminal takes a bit of typing, especially if you saved the CWM .img file somewhere awkward, like /Users/you/Documents/root_the_nook. But if you locate the .img file with the finder, you can drag it and drop it right into the command line. Who knew?

My first attempt looked alright, but any Market download I tried failed. That's when I switched from CWR 3.0.2.8+ManualNooter 4.5.2 to CWR 3.2.0.1+ManualNooter 4.6.16 -- because the more you fail, the more you google. The second try fared worse than the first: Market shut down as soon as I tried to open it, and so did Gmail.

Before my third attempt I de-registered the NC and put it back in pristine factory condition using instructions I googled further and found here. If you want to do as I did, scroll down to the post by Colchiro that mentions "wipe Davlik cache". Do as it says.

Then I re-registered the Nook, and proceeded with the rooting once again. Three was the charm. Now I have an Android Froyo tablet. I have no idea what I'll do with it. I installed a doodling app; Kate might like it.

General lesson learned: forget about banks. I've seen too big to fail, and it's Google. If it shut down tomorrow, the world would end.

Factors in Stata and R

The quick version of this post goes like this:
-- # in Stata is : in R
-- ## in Stata is * in R.

The long version is that both Stata and R handle very nicely factor variables in regression models. If you want a full-factorial interaction between a factor variable x1 and a continuous variable x2, the Stata way is to say


regress y i.x1##c.x2

whereas the R way is to say


lm(y~factor(x1)*x2)

Now, if you just want to interact x1 with the slope of x2, the Stata way becomes


regress y i.x1#c.x2

whereas the R way becomes


lm(y~factor(x1):x2)

That's all.

All the SAS you need

You may find yourself on a job where people use SAS, but you would rather use Stata. If you have both SAS and Stata installed on your computer, you can simply put Dan Blanchette's usesas to work. That's all you need.

If you don't have SAS installed, make your colleagues' lives easier and give them a script that lets them send data from SAS format to csv in bulk. Below is an example. You could call it gimmedata.sas.

The script starts by defining some lists of stubs for file names that follow the pattern "stub_2011mmdd.sas7bdat". You can alter these lists to suit your situation. It then loops through them and turns SAS data sets into .csv files:


%let theuser  =Gabi;
%let thepath  =C:/Users/&theuser/myproject/;
%let datain   =&thepath.sas_data/;
%let csvout   =&thepath.sas_to_csv/;
%let stubs    =dw dw_mean hm matrix pop fore;
%let days05   =05 13 20 27; /* dates in May */
%let days06   =03 09 16 22; /* dates in June */

libname  mylib "&datain";
title;

*options mcompilenote=all; /* default is none */
*options mprint mlogic; 

/* check if a data set exists, send it out as csv */
%macro getfile(stub,month,day);
	%local thefile;
	%let thefile=&stub._2011&month.&day;
	%if %sysfunc(exist(mylib.&thefile.)) %then %do;
		PROC EXPORT DATA=mylib.&thefile.
        	OUTFILE="&csvout.&thefile..csv"
        	DBMS=CSV REPLACE;
    		PUTNAMES=YES;
		RUN;
	%end;
	%else %put Data set &thefile does not exist.;
%mend getfile;

/* files come in every week, at dates listed
in &&days&month. so, run the %getfile macro
through all of those dates */
%macro getfiles(stub,month);
	%local thelist;
	%local count;
	%let thelist=&&days&month; /* going through days this month */
	%let count=0;
	%do %while(%qscan(&thelist.,&count.+1,%str( )) ne %str());
		%let count=%eval(&count.+1);
		%let day=%scan(&thelist.,&count.);
		%getfile(&stub,&month,&day);
	%end;
%mend getfiles;

/* weekly files come with names that start
with the stubs listed in &stubs. so, run
getmonth through all those stubs */
%macro getmonth(month);
	%local thelist;
	%local count;
	%let thelist=&stubs; /* going through file name stubs */
	%let count=0;
	%do %while(%qscan(&thelist.,&count.+1,%str( )) ne %str());
		%let count=%eval(&count.+1);
		%let stub=%scan(&thelist.,&count.);
		%getfiles(&stub,&month);
	%end;
%mend getmonth;

/* now call your macros */
%getmonth(05);
%getmonth(06);

Building a repeater bridge

I turned a 7-year old Motorola WR850G (v3) 802.11b/g router into a repeater for a dual-channel Netgear WNDR3700 that I bought a few weeks ago. Dual-channel means that the thing works over both 2.4GHz and 5GHz. More on that later. The job took some googling, but flashing an old router and installing third-party firmware turned out to be a lot less scary than I thought it would be. Here are some lessons learned:

1. If your wifi signal is suddenly weaker and spottier after years of perfect service, that doesn't mean that your old router is dying. Maybe you moved it somewhere new, for example from the middle of the basement to the side of the living room under the TV set on the main floor. If your your home office is on the top floor, you might be expecting no loss in signal strength, because the difference in elevation is now smaller. But the middle of the basement may have been closer as the stone falls than the side of the living room is. Maybe new gear in the house interferes with the router. Or maybe existing gear that you have where the TV sits interferes with the router where it is now, and it didn't when the router was in the basement. All of that sounds plausible to me.

2. If the old router still works, the new router will probably not give you any better results, especially if you place it where you moved the old router. If moving the old router back to its old place is an option, just take the new one back to the store and don't bother reading further.

3. Otherwise, your best option may be to flash the old router, put DD-WRT on it, then turn it into a repeater bridge. The bridge picks up signal from the router and the computers pick it up from the bridge. This will work if the old router is better at receiving weak signal than the computers of interest are. If this is true, then a repeater sitting in the home office can pick up signal from the new modem more reliably than the computers there can, and re-broadcast it to them.

4. There are four kinds of 802.11 wifi signal that I know of: a, b, g and n; a works in the 5GHz band; b and g are 2.4GHz; n works over both. As you go up the alphabet from a to n, higher letters are newer. The data transfer speed also grows in that direction from a maximum of 11Mbps to 300Mbps. Fancy dual-channel routers transmit both 2.4GHz and 5GHz signal. 2.4GHz carries farther. Your repeater will pick up more n-signal bars in the 2.4GHz band than it will in the 5GHz band.

5. Your router drops the data transfer speed to that of the slowest device in the network: if your laptops, smartphones, etc. are 802.11n but the wireless printer is 802.11g, then while you print all of your devices exchange data at 802.11g speed. That is a theoretical maximum of 54Mbps, which is still a lot faster than the typical internet connection, so you will notice no loss in how well YouTube works while you print. Also, since n-grade speeds resume after you're done printing, and you probably don't print that often, you probably won't want to upgrade a perfectly good wireless printer on account of its wifi capabilities alone, but read the bit below, about encryption.

6. There is another reason why you might be getting g-grade speeds from an n-capable router: n-grade speeds are not possible if you use WPA-TKIP encryption. This means that if you choose this type of encryption, the g-grade speed you get is permanent. As of this writing, you should use WPA2-AES encryption. Some gear calls it WPA-AES. It's the same thing. AES encryption is the only consumer-grade encryption that's any use, though probably not for long. Your other choices, WEP and WPA-TKIP, are demonstrably useless. Both have been cracked a long time ago. If your cable guy set up your router with a WEP numeric key, change it. In particular, if your router proposes mixed WPA-TKIP/AES encryption (it might call it WPA-TKIP/WPA2-AES) don't pick that. Choose pure AES. The encryption on the repeater should also be AES. If your g-grade wireless printer offers WPA-TKIP but not WPA2-AES encryption, update its firmware. If that does not help, you do have two options that I can think of short of buying a new printer. You could connect it to the repeater bridge by USB or Ethernet cable, and have the bridge act as a wireless print server. Or you could make the printer join the wireless network through a wireless-n USB dongle. I didn't try either of them.

7. If in the process of installing new firmware your old router seems to be hanging in the "reboot" mode, with the power LED glowing red, and that's going on for a very long time, maybe you bricked your router. Then again, maybe you didn't. It happened to me, and when after about 10 minutes I still didn't see all the lit LED's settle into nice solid green, I gave up and unplugged the thing. When I plugged it back in, I was greeted by a working DD-WRT machine. Good surprises happen.

Supply chain magic

  \begin{tabular}{rll}  \hline  \hline  \multicolumn{3}{ c }{New computer on its way home: halfway across the world in one weekend} \\  \hline  {\bf Date/Time} & {\bf Activity} & {\bf Location} \\  5/16/2011 7:34	& On FedEx vehicle for delivery & DURHAM, NC \\  5/16/2011 6:26	& At local FedEx facility & DURHAM, NC \\  5/15/2011 22:10 & In transit & RALEIGH, NC \\  5/15/2011 16:08 & Departed FedEx location & MEMPHIS, TN	\\  5/15/2011 1:13 & Arrived at FedEx location & MEMPHIS, TN	\\  5/14/2011 15:56 & Departed FedEx location & ANCHORAGE, AK \\  5/14/2011 13:38 & Int'l shipment release & ANCHORAGE, AK \\  5/14/2011 8:24 & Arrived at FedEx location & ANCHORAGE, AK \\  5/14/2011 16:22 & In transit & SHANGHAI CN	\\  5/13/2011 22:50 & Picked up & SHANGHAI CN \\  5/13/2011 6:58 & Shipment information sent to FedEx & \\  \hline  \end{tabular}