SCM

[#6816] glmrobMqle fails where it probably should not

Date:
2023-08-25 17:11
Priority:
3
State:
Open
Submitted by:
Christof Schötz (cschoetz)
Assigned to:
Nobody (None)
Hardware:
PC
Product:
None
Operating System:
None
Component:
None
Version:
None
Severity:
None
Resolution:
Accepted As Bug
URL:
Summary:
glmrobMqle fails where it probably should not

Detailed description
# This seems to be a bug as glmrobMqle fails in the most simple tasks:

n <- 10
x <- seq(0, 1, len=n)
y <- 1 - 2*x + rnorm(n, sd=0.2)
y[1] <- 3 # outlier

fitRobust <- robustbase::glmrob(y ~ x, family = 'gaussian')
fitOls <- lm(y ~ x)

# Robust regression fails to detect the outlier - it's result is the same as the OLS.
all.equal(fitOls$coefficients, fitRobust$coefficients)

y[1] <- 10 # more severe outlier
# The algorithm does not converge.
fitRobust <- robustbase::glmrob(y ~ x, family = 'gaussian')

Comments:

Message  ↓
Date: 2023-08-26 07:19
Sender: Christof Schötz

Thank you for your quick response!

> Well, "nobody" would use glmrob() for the normal ("gaussian") case.

Well, somebody did. In a now published paper. I basically came here in the search of the reason why some plots in that paper look odd and found that they use robustbase::glmrob(family = 'gaussian').

> Yes, we should probably warn for `family = "gaussian"` that using lmrob() with much more sophisticated algorithms and notably standard errors, confidence intervals etc, is much recommended for the normal distribution.

If using this method is discouraged, a warning would certainly be appreciated.

> Lastly: Your example is *not* reproducible! you forgot to set the random seed in your R code.

Right! With set.seed(0) the code works as I intended.

Date: 2023-08-25 21:29
Sender: Martin Maechler

Well, "nobody" would use glmrob() for the normal ("gaussian") case.
The very very strongly recommended function there is lmrob()
and it works nicely here.

Yes, we should probably warn for `family = "gaussian"` that using lmrob() with much more sophisticated algorithms and notably standard errors, confidence intervals etc, is much recommended for the normal distribution.

Lastly: Your example is *not* reproducible! you forgot to set the random seed in your R code.

Yes, it *is* still a bug here that the "Mqle" algorithm does not converge correctly; for some random seeds, I see it "looping" / "ping-pong"in between 2 or 4 coefficient-configurations;
Here where you set a leverage point, Huber regression is known not to be very good (if it is not started robustly); the `weights.on.x = "<opt>"` should help in principle, but in one case I tried, `weights.on.x = "hat"` did not help enough.


Attached Files:

Changes

Field Old Value Date By
ResolutionNone2023-08-25 21:29mmaechler
Thanks to:
Vienna University of Economics and Business Powered By FusionForge