Detailed description |
Zero-truncated binomials appear in at least two research contexts:
1) capture-recapture field studies: Typically, the aim of such studies is to estimate the abundance of a species.
2) defect detection in (software) engineering. A number of inspectors independently examine a software or related artifact and try to identify weak points. In software engineering capture-recapture have been employed for deciding on whether all (or most) problems have been discovered or further inspections are required.
While current focus lies on estimating the number of items (animals or defects), in both contexts it also is of interest to examine how the discovery rate is influenced by some predictors. For example, one may want to track the abundance over time or compare different inspection methods.
Both settings can be imagined as a binomial experiment with successes and misses on a fixed number of trials. Furthermore, the resulting data can be displayed as a dichotomous matrix, similar to the Rasch model. In terms of mixed effects models this is a crossed random effects situation (cp. [1,2]). However, the response matrix is incomplete (thus, the number of trials is unknown) as it only comprises items (animals/defects) that have at least been discovered once. It is safe to say that this is not missing at random and it clearly biases the parameter estimates.
Note that due to the crossed random effects, predictors can be associated with both "sides": items (for example, types of defects, age or gender of animals) and inspectors (e.g., level of experience, used method) or be independent (e.g. weather conditions or type of system under examination).
I believe that the appropriate solution is a zero-truncated logistic regression with crossed random effects. On R-sig-ME I was informed by Ben Bolker that an implementation in glmmADMB is rather painless.
Since I am more an R end user than a programmer, I probably won't be of much help. What I can offer are some real data sets for proof of concept and perhaps a co-authorship on a publication (see [3,4,5] for my previous work).
Anybody?
[1] Quené, H., & van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59(4), 413-425. doi:10.1016/j.jml.2008.02.002
[2] Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the Multilevel Rasch Model: With the lme4 Package. Journal of Statistical Software, 20(2), 1-18. Retrieved from http://www.jstatsoft.org/v20/i02
[3] Schmettow, M. (2009). Controlling the usability evaluation process under varying defect visibility. BCS HCI ’09: Proceedings of the 2009 British Computer Society Conference on Human-Computer Interaction (pp. 188-197). Swinton, UK: British Computer Society.
[4] Schmettow, M. (2012). Sample Sizes for Usability Studies: Magic Numbers Disenchanted. Communications of the ACM, 55(4). (to appear in April)
[5] Schmettow, M., & Vietze, W. (2008). Introducing item response theory for measuring usability inspection processes. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI ’08 (p. 893). New York, New York, USA: ACM Press. doi:10.1145/1357054.1357196 |
|