Forum: help


RE: Error related to mob_grow: 'vector size cannot be NA' [ Reply ] By: Achim Zeileis on 2021-08-08 09:45 | [forum:49087] |
If you find it useful, you can have empty factor levels in the regression variables in glmtree(). At least I didn't experience any problems with this. However, among the split variables that are used for recursively partitioning the GLM non-empty factor levels are required. (I never noticed this because it didn't occur to me to try to use empty factor levels.) |
RE: Error related to mob_grow: 'vector size cannot be NA' [ Reply ] By: Cesko Voeten on 2021-08-08 09:25 | [forum:49086] |
Thanks for your lightning-fast responses! I'm surprised that the names of the factor levels turned out to matter - I'm not used to this from a GLM context, where operations are normally performed only on the model matrix, not on the actual factor objects. But your explanations make sense, and I can just rename the empty level to something else. Thanks for the help! |
RE: Error related to mob_grow: 'vector size cannot be NA' [ Reply ] By: Achim Zeileis on 2021-08-07 22:56 | [forum:49085] |
Thanks, Marjolein, for spotting the source of the problems! Just a couple of further comments: Base R treats the "" string differently when used as a name in a vector. See ?names: The name '""' is special: it is used to indicate that there is no name associated with an element of a (atomic or generic) vector. Subscripting by '""' will match nothing (not even elements which have no name). This does not have anything to do with partykit. So the only thing we could do would be to throw an explicit error regarding this. Example for what goes wrong: ## create vector with one zero-length name x <- 1:2 names(x) <- c("", "a") x ## a ## 1 2 ## indexing by empty name leads to NA x[c("", "a")] ## <NA> a ## NA 2 ## hence assignment in combination with indexing by "" name is unexpected x[""] <- 3 x ## a ## 1 2 3 names(x) ## [1] "" "a" "" |
RE: Error related to mob_grow: 'vector size cannot be NA' [ Reply ] By: Marjolein Fokkema on 2021-08-07 19:08 | [forum:49084] |
Dear Cesko, The error seems due to d$Lev having a level that is an empty string: > d <- read.csv('bug.csv') > d$C <- factor(d$C); d$Lev <- factor(d$Lev) > levels(d$Lev) ## [1] "" "f" "m" "n" "s" > library("partykit") > m <- glmtree(Y ~ 0 + C | Lev + C, data=d, family=poisson) # fails The following fixes the error: > levels(d$Lev) <- c("new", "f", "m", "n", "s") > m <- glmtree(Y ~ 0 + C | Lev + C, data=d, family=poisson) # works > m ## Generalized linear model tree (family: poisson) ## ## Model formula: ## Y ~ 0 + C | Lev + C ## ## Fitted party: ## [1] root ## | [2] Lev in f, n: n = 4354 ## | Cresp1 Cresp2 ## | -0.8777681 -0.5373573 ## | [3] Lev in new, m, s ## | | [4] Lev in new, m: n = 10686 ## | | Cresp1 Cresp2 ## | | -1.0676535 -0.4213115 ## | | [5] Lev in s: n = 390 ## | | Cresp1 Cresp2 ## | | -1.4228519 -0.2757873 ## ## Number of inner nodes: 2 ## Number of terminal nodes: 3 ## Number of parameters per node: 2 ## Objective function (negative log-likelihood): 12738.99 On a different note, using the same variable (C) as both a predictor for the node-specific GLM (Y ~ 0 + C), as well as a partitioning variable (| Lev + C), is redundant. Because C is a factor, the resulting parameter stability test will always be zero. See e.g.: > strucchange::sctest(m) Although this may only occur in this very simple testcase, and may not be representative of the final tree you are trying to fit. If you have multiple predictor variables for the node-specific GLM, or if the predictor is a continuous variable (and is also a possible partitioning variable), the parameter stability test(s) will not be zero. Best, Marjolein |
Error related to mob_grow: 'vector size cannot be NA' [ Reply ] By: Cesko Voeten on 2021-08-07 09:34 | [forum:49083]![]() |
Please find attached a very simplified testcase: library(partykit) d <- read.csv('bug.csv') d$C <- factor(d$C); d$Lev <- factor(d$Lev) m <- glmtree(Y ~ 0 + C | C + Lev,data=d,family=poisson) Seems simple enough, right? But for some reason, this happens: Error in vector(mode = "list", length = max(kidids)) : vector size cannot be NA I've tried very hard to debug this myself, and the farthest I've gotten is that somewhere deep within mod_grow() a p-value is computed to be NA, which gets propagated into kidids, so the code is indeed trying to set a length of NA. I couldn't find anything obviously wrong with my model and/or data. The problem only occurs when both 'C' and 'Lev' are in the partitioning field. What I am really trying to do is much more complex (I'm trying to get a multinomial mixed model out of glmertree via the Poisson trick), but this seemed the smallest testcase I could reduce the problem to. |