Forum: help

RE: Strange Bias Using cforest [ Reply ] By: Torsten Hothorn on 2017-01-13 14:09	[forum:43796]
I need to have a look at the code you are using. Could you pls send it directly to <torsten.hothorn@R-project.org>, this web-based forum stuff is so inconvenient.

RE: Strange Bias Using cforest [ Reply ] By: Julian Karch on 2017-01-12 16:17	[forum:43795] nobias.pdf (36) downloads
Yes, I realized that. I forgot to add the nobias.pdf attachment. I provide it here.

RE: Strange Bias Using cforest [ Reply ] By: Achim Zeileis on 2017-01-12 16:12	[forum:43794]
Yes, it did arrive. But for some reason that I don't fully understand we have to manually approve every message before it is released in the forum. I have just done so. But I'll leave the reply to the actual question to Torsten. Best, Z

RE: Strange Bias Using cforest [ Reply ] By: Julian Karch on 2017-01-12 15:23	[forum:43793]
Did the message I sent yesterday arrive?

RE: Strange Bias Using cforest [ Reply ]
By: Julian Karch on 2017-01-11 17:41

[forum:43792]

default.pdf (19) downloads

Thanks for the very swift reply! I have a rough understanding of what you mean now.

For my data the OOB predictions (y_hat) using the default settings for partykit and cforest_unbiased for party look like default.pdf. r denotes the correlation. y is the true value. There, are three things that I find strange.

1. The huge difference between party and partykit. I tried to find default values in which the two packages differ but could not find any significant differences besides mtry, which is 5 for party and ceiling(sqrt(nvar)) for partykit. However, the difference remains the same after setting mtry to 5 for partykit

2. The high correlation of the OOB predictions and the simultaneous high bias towards the mean.

3. Just in general the high correlation I find surprising. I did not expect such a good prediction

In your example if I set

mincriterion = 0, minbucket = 0, minsplit = 0

the last bit of bias vanished as expected.

Thus, to get rid of the bias I also set

minsplit = 0,minbucket = 0, mincriterion = 0

for both packages. mincriterion is also 0 in the default settings. This results in nobias.pdf. There were two surpises here

1. The bias did diminish but not disappear
2. The correlation increased even further

Any thoughts? Maybe I am not getting the OOB predictions for partykit? I use

predict(model, OOB=TRUE)

RE: Strange Bias Using cforest [ Reply ]
By: Torsten Hothorn on 2017-01-06 08:36

[forum:43777]

library("partykit")
set.seed(29)

### linear regression
n <- 100
x <- sort(runif(n, min = -1, max = 1))
y <- rnorm(n, mean = 2 + 3 * x, sd = .1)
plot(x, y)

### stump: strong penalisation towards mean(y)
ct <- ctree(y ~ x, control = ctree_control(stump = TRUE))
lines(x, predict(ct), col = "blue")

### forest with small trees: same effect
cf <- cforest(y ~ x, control = ctree_control(minbucket = 20))
lines(x, predict(cf), col = "red")

### forest with large trees: much smaller effect but
### penalisation still visible for small/large x values
cf <- cforest(y ~ x, control = ctree_control(mincriterion = 0,
minbucket = 5, minsplit = 5))
lines(x, predict(cf), col = "green")

RE: Strange Bias Using cforest [ Reply ] By: Julian Karch on 2017-01-05 16:42	[forum:43776]
Could you elaborate on why the shrinking happens? I don't understand why this should happen.

RE: Strange Bias Using cforest [ Reply ] By: Torsten Hothorn on 2016-12-22 08:38	[forum:43765]
This is what penalisation does to you. Implicitly, random forests shrink the predicted means. The effect will be more pronounced when smaller trees are aggregated. You can play with mincriterion, minsplit, minbucket and maxdepth a little. Best, Torsten

Strange Bias Using cforest [ Reply ]
By: Julian Karch on 2016-12-21 21:07

[forum:43764]

randomForest.pdf (21) downloads

Hi,

attached find a scatterplot with out of sample predictions (obtained using predict(forestModel, OOB=TRUE)) obtained by a cforest on the x-axis and the true values on the y axis.

What I find strange is that the scale of the predictions is off. Simply compare the limits of the x- and the y-axis. Also, the best fitting linear has a slope of more than 6 rather. I expected this to be 1 for any well behaved model.

Do you have any insights about this problem? If necessary, I can also produce a minimal example that reproduces the error.