Introducing syncR

A new and improved version of the syncPacks() function is now part of a GitHub package, which you can install through devtools::install_github('ghuiber/syncR'). If you're into that, you can help develop it further too.

Thanks go to Hilary Parker for her thorough instructions and to Hadley Wickham for devtools and roxygen2.

Some unresolved hiccups with R 3.1.0 on Mavericks, and a workaround

If you're going to download the Mac binaries for the latest R, you will see that they come in "Snow Leopard and higher" and "Mavericks and higher" flavors. If you run Mavericks, the latter is a natural choice, though the former clearly says "and higher" too, so it's got to be a valid option as well.

As it turns out, it's the better option, at least as of this writing.

The Mavericks build crashes with a segmentation fault upon attempting to load either the caret or data.table library, as reported here and here. A brief search through the R-SIG-Mac Archives returned no useful leads for fixing the problem.

Dropping the Mavericks build and installing the Snow Leopard one gave me back both caret and data.table. This works for me.

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCurl_1.95-4.1   bitops_1.0-6     scales_0.2.4     ggplot2_1.0.0    reshape_0.8.5    data.table_1.9.2
[7] MASS_7.3-31     

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     grid_3.1.0       gtable_0.1.2     htmltools_0.2.4  munsell_0.4.2   
 [7] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.2      reshape2_1.4     rmarkdown_0.2.46 stringr_0.6.2   
[13] tools_3.1.0  

Macbook Pro running hot, draining battery after upgrading to SSD?

Mine did. That was an unpleasant surprise. Googling for a solution brought up untold amounts of speculation and wasted time.

What ended up working for me was resetting the System Management Controller (SMC), as documented here and especially here. You should see that Reddit comment thread, especially if you're also wondering whether you're supposed to enable TRIM.

Resetting the SMC brought down the CPU core temperatures from about 90°C to about 60°C, low enough for the fan to not kick in. My Mac is once again as quiet as it used to be.

FreeNAS works as advertised

I decided to replace the HDD with a SSD in my Mac for Christmas, but I only got as far as buying the thing and backing up the computer using Time Machine as explained here to a poor man's FreeNAS server that I cobbled together from a USB stick (for the OS) and the old Fedora 14 home server whose sole 500G HDD is now one big ZFS volume, with 2G of RAM.

That's right, ZFS on one HDD with 2G of RAM. I'm not saying that this is a good setup. The official hardware recommendation is 8G for ZFS. But this is the kit I had lying around, and I just wanted to move on with the actual disk replacement; my D510MO board won't even support more than 4G of RAM (though I'm not sure why, since it was made to accommodate a 64-bit CPU). Anyway, I managed to make one first complete Time Machine backup and a few incremental ones before leaving for work on Monday, January 6.

I flew back on Thursday, January 9, and found a non-responsive Mac with a HDD so sick that an erase-and-install OS restore was in order. That's what you end up having to do when upon entering your password at boot-up you see the apple logo for a while, then that "prohibited access" barred circle, while the gear animation is spinning and spinning.

I have no idea how this happened. I felt very fortunate for having made that backup. I decided that the accident was a good excuse to just proceed with the SSD installation already.

The proof of this particular pudding was going to be in restoring the old system from that Time Machine backup, over the LAN, off the grossly inadequate NAS box. I am happy to report that the restore succeeded, and my Mac is back in business, now with a SSD.

What I'm saying is this: if you don't have a Time Capsule but do have some idle hardware, FreeNAS may be a good Time Machine backup solution for you too.

One thing you will want to know about is user quotas: a 500G NAS HDD will fill up quickly if you let Time Machine have its way with it. The solution is to set some reasonable user quotas for people in your house who might use the FreeNAS box as their Time Machine backup destination. You can do that from the web GUI. The Advanced Mode of the Create ZFS Dataset menu under Storage (or, for an existing dataset, the Advanced Mode of Edit ZFS Options) lets you set quotas four different ways; for specifics, google thin and thick provisioning. This seems to be advanced sysadmin stuff.

There is also a command-line recipe for setting user quotas here. You get to the FreeNAS shell from the web GUI: look at the bottom of the vertical navigation menu on the left.

Smaller quotas will force Time Machine to keep a shorter history. It deletes old backups as it runs out of space -- so, less room, shorter history. That is not a bad thing.

Invisible methods

R objects come with various methods that make them useful. I tend to stumble over these by googling something I want to do, and finding some code example on StackOverflow. But today I learned (from @RLangTip) that there is a straightforward way to list them all: you simply call e.g., methods(class='lm').

That's nice, but mileage varies and I don't have a good explanation for it. Take Zelig for example. It has this sim() function which produces a simulation object with some methods of its own. One of these is, illustrated here. Unfortunately, you won't find it with the methods() call:

> library("Zelig", lib.loc="C:/Program Files/R/library")
Loading required package: boot
Loading required package: MASS
Loading required package: sandwich
ZELIG (Versions 4.2-2, built: 2013-10-22)

|  Please refer to for full       |
|  documentation or help.zelig() for help with commands and      |
|  models support by Zelig.                                      |
|                                                                |
|  Zelig project citations:                                      |
|    Kosuke Imai, Gary King, and Olivia Lau.  (2009).            |
|    ``Zelig: Everyone's Statistical Software,''                 |
|                              |
|   and                                                          |
|    Kosuke Imai, Gary King, and Olivia Lau. (2008).             |
|    ``Toward A Common Framework for Statistical Analysis        |
|    and Development,'' Journal of Computational and             |
|    Graphical Statistics, Vol. 17, No. 4 (December)             |
|    pp. 892-913.                                                |
|                                                                |
|   To cite individual Zelig models, please use the citation     |
|   format printed with each model run and in the documentation. |

Attaching package: ‘Zelig’

The following object is masked from ‘package:utils’:


> methods(class='sim')
[1] plot.sim*   print.sim*   repl.sim*   simulation.matrix.sim*
[5] summary.sim           

   Non-visible functions are asterisked

See that? There's a non-visible plot() method listed, but no method, yet it exists and it works. I wonder why that is. Is it maybe that is some kind of child of plot()? If so, how do you list such children?

How I backed up a bunch of old pictures to Amazon Glacier

This is from a home server that runs Fedora 14, to which I have ssh access from my MacBook Pro.

1. I git clone'd this.

2. Then, as super-user, I called

wget -O - | python

as instructed here, to install the setuptools module.

3. Then, also as super-user, I called

python install

4. At this point, it was time to fill out the .glacier-cmd configuration file, as shown in the

5. Bookkeeping using Amazon SimpleDB requires setting up an Amazon SimpleDB domain (= database) first. You cannot do this through the AWS Management Console.

6. So I googled, and found official directions here.

7. Unfortunately, my Chrome wouldn't render properly the SimpleDB Scratchpad web app. That caused some unnecessary confusion. The solution was to just run Scratchpad in Safari.

8. Your computer has folders and files. Amazon Glacier has vaults and archives. One archive = one upload. This can be an individual file, but it's more practical to bundle individual files into tarballs first, so one archive = one tarball.

9. I'm in business: two large tarballs uploaded and showing up in my SimpleDB domain that keeps tabs on this particular vault, one on the way.

It looks like everything works, but I can't be sure until Amazon Glacier gets around to producing an inventory (this happens about once a day, it seems). I can then check SHA sums between what's on Glacier and what I thought I sent there. Next I will upload something small, then download it the next day.

Glacier is the digital equivalent of self-storage. You put stuff there that you don't really want anymore; you think you might, but you don't. It's a problem that comes with ease of acquiring such stuff in the first place. I don't think there's a big self-storage industry in Zambia, and I'm sure that storing old photos wasn't much of a problem back when you had to take them on film and you only had 32 frames in a roll.

I have no idea why we bother with digital self-storage. I guess simply deleting old pictures and a bunch of music we no longer listen to makes us feel like jerks. It's a total trap.

I put up my first post on RPubs

Sure, it may be the 4chan of data analysis, but it's so nice to be able to do R Markdown right there in RStudio and just hit the Publish button.

Of course, this convenience has downsides. I know it's prudent to sit with your work a bit, just like thinking carefully before you go skinny-dipping, especially when you don't have the benefit of peer review.

On the other hand, it's no use to wait until nobody cares anymore. So, here goes.

Stata 13 is coming on June 24

Yellow color scheme is out, sky-blue is in, plus expanded capabilities, as one might expect. Notable among them, xtologit, xtoprobit and long strings -- 2 billion character long, that is. One of these days you won't need an RDBMS anymore. Wouldn't that be nice?

See more details here.

Keeping knitr happy after upgrading to R 3.0.0

As noted here, after upgrading to R 3.0.0 you must run


This is because a bunch of packages have to be to rebuilt under R 3.0.0 in order to keep working.

So I did, but that was not enough for LyX to be able to compile my pdf's from knitr like it used to only a week ago. What I had to do besides was this:

install.packages("/Users/ghuiber/Downloads/tikzDevice_0.6.3.tar.gz", repos = NULL, type="source")

That is right. The package tikzDevice can no longer be installed directly from R-forge as a binary, as in

install.packages("tikzDevice", repos="")

Also, the source files are only available as a .tar.gz archive. To install from it on a Windows machine, you must have Rtools installed first.

A quick note on rJava

I recently had to set up a PC with similar kit as I have on my Mac. On this PC the OS is Windows 7 64-bit but the browser is IE8 32-bit. This causes jucheck.exe to install (and occasionally update) 32-bit Java. This is unfortunate if you use 64-bit R, because it breaks the rJava package, which in turn breaks the xlsx package, with the practical consequence that you cannot read Excel worksheets into R.

There is a workaround. First, install Oracle's manual download of 64-bit Java. As of this writing, its Windows 7 home will be in C:\Program Files\Java\jre7. You should add this to the %path% environment variable. In addition, the rJava package depends on jvm.dll, and R might be looking for it in the wrong spot. It won't hurt, then, to add this to your %path% as well: C:\Program Files\Java\jre7\bin\server. There's more on this, as usual, on StackOverflow.

As Oracle warns, your manually-installed 64-bit Java will not be automatically updated. That is a problem when security flaws hit Java, but I find being able to read Excel files into R so useful that I'm willing to just live with this risk, though I don't have a good idea how to best manage it. I'll just keep an eye on ArsTechnica for bug news. If anybody has a better way, I'm all ears.