My previous post promised some simulations. To refresh your memory, I am trying to see how reliably Multiple Systems Estimation, as described here, can guess the true number of fish in a pond.
The density plot below tells the story. The true number of fish is 150,000. Each catch limit lets you make a best guess, which is the x-coordinate of the peak of its associated bell curve. The shape of each bell curve measures the uncertainty surrounding the guess: the flatter the bell, the more uncertain the guess. Perfect foresight would be a spike at the 150,000 x-mark. Curves that peak away from that mark make biased guesses.
It is obvious that larger daily catch limits allow you to guess better. Very low catch limits set you up for severe downward bias. The red curve, corresponding to a daily catch of up to 500 fish, cannot help underestimating the true population size, for reasons discussed in the previous post. Then there seems to be a range of catch limits that improve on the bias, but increase the uncertainty horribly – that’s the green curve. So, you need higher limits, but you don’t have to go crazy. The gain in precision after some point is not worth it: though a limit of 20,000 is very accurate, a limit of half that is not much worse.
I was inspired to run this exercise by a class I’m taking. The work would have been slower without help from here (I recommend the book, I bought it) and here. The picture would not have looked this good without ggplot2 (you should get that book too). All errors are my own. Here’s the code:
If you made it here, you might also want to see one last thing on the matter.