Method of moments not robust to misspecified distribution

The following code is adapted from an example invented by @KieranDelamotte.

We try to fit a Gumbel distribution on a sample that is actually obtained from a mixture of a Gumbel and a Fréchet distribution.

GumbelFactory does not manage to find a sensible set of parameters to try to fit the sample.

WeibullMaxFactory, on the other hand, manages to find a Weibull distribution that kind of fits, but it has crazy parameters… because it is actually very close to a Gumbel. In fact, when put into a GEV, the GEV \xi parameter is 0: it is a Gumbel distribution.

The bottom line is that WeibullMaxFactory manages to find a better Gumbel than GumbelFactory.
The difference is that WeibullMaxFactory maximizes (by default) the likelihood whereas GumbelFactory fits the first and second order moments to the empirical moments of the sample. Likelihood maximization is more robust to (this particular) distribution misspecification than the method of moments.

# Try to fit a WeibullMax on the sample.
# WeibullMaxFactory maximizes the likelihood.
weibullmax = ot.WeibullMaxFactory().build(sample)
print("WeibullMaxFactory result: ", weibullmax)

# Express the estimated WeibullMax as a GEV
gev = ot.GeneralizedExtremeValue(weibullmax)
print("GEV actual distribution: ", gev.getActualDistribution())

# Recompute the GEV actual distribution
# This WeibullMax turns out to be a Gumbel
# because the GEV xi parameter is zero.
print("GEV xi parameter = ", gev.getXi())
print("More sensible GEV actual distribution: ", gev.getActualDistribution())

# Try to directly estimate a Gumbel distribution.
# GumbelFactory uses the method of moments.
gumbel = ot.GumbelFactory().build(sample)
print("GumbelFactory result: ", gumbel)

graph = ot.HistogramFactory().build(sample).drawPDF()
graph.setColors(["green", "red", "blue"])
graph.setLegends(["Sample", "MLE", "Moments"])
graph.setTitle("Estimated Gumbel distribution")
view = viewer.View(graph)
axes = view.getAxes()
_ = axes[0].set_xlim(-20.0, 20.0)


WeibullMaxFactory result:  
    WeibullMax(beta = 4.25531e+06, alpha = 1.45411e+06, gamma = 4.25531e+06)
GEV actual distribution:  
    WeibullMax(beta = 4.25531e+06, alpha = 1.45411e+06, gamma = 4.25531e+06)
GEV xi parameter =  -6.877037385642863e-07
More sensible GEV actual distribution:  
    Gumbel(beta = 2.92639, gamma = 2.5121)
GumbelFactory result:  Gumbel(beta = 15.1936, gamma = -3.88787)


Thank you Joseph for your analysis

I tried to find some background to support the affirmation that “Likelihood maximization is more robust to distribution misspecification than the method of moments.”, to see if it is a general property or an observation on this specific example. According to a discussion here it does not look like a general property, but it is the opinion of another user. Do you have some theoretical background to share about the comparison between MLE and MOM performances regarding distribution misspecification?

No, I do not have such theoretical background. I have edited this sentence in the original post because it only applies to the particular example discussed. I think such a claim would be difficult to establish (one way or another) in a general sense because it depends on how the distribution is misspecified. In @KieranDelamotte’s example there is a distribution at the “edge” of the WeibullMax family that tolerably fits the sample. If the true distribution were completely different from the family, it would likely be a different story.

I think we may consider implementing the MLE (\hat{\beta}, \hat{\gamma}) for the GumbelFactory. Formulas are given in Statistical Distributions by Forbes et al. (2010):

\hat{\beta} = \frac{1}{n} \sum_{i=1}^n x_i - \frac{\sum_{i=1}^n x_i \exp \left( - \frac{x_i}{\hat{\beta}} \right)} {\sum_{i=1}^n \exp \left( - \frac{x_i}{\hat{\beta}} \right)} ; \\ \hat{\gamma} = - \hat{\beta} \log \left[ \frac{1}{n} \sum_{i=1}^n \exp \left( - \frac{x_i}{\hat{\beta}} \right) \right] .

The expression of \hat{\gamma} is a function of \hat{\beta} so numerical optimization is only necessary to compute \hat{\beta}.

Interesting fact: if X \sim Gumbel(\beta, \gamma), then Y := \exp \left( \frac{\gamma - X}{\beta} \right) \sim \mathcal{E}(1). Note that the MLE (\hat{\beta}, \hat{\gamma}) satisfies

\frac{1}{n} \sum_{i=1}^n \exp \left( \frac{\hat{\gamma} - x_i}{\hat{\beta}} \right) = 1 = \mathbb{E}[Y].


There are situations where the MLE is undefined, e.g. when the maximum of the likelihood is not well separated (see Van Der Waart, p.45). This happens when the parameters cannot be identified and the likelihood is flat locally or at the infinity (i.e. there is an infinite number of solutions which reach the maximum).

I suggest two situations (based on a discussion with @MerlinKeller):

  • when one of the parameters is a bound parameter (where the PDF is constantly zero),
  • when we consider a mixture, which, depending on the parametrization, may lead to an infinite number of combinations which all lead to the same likelihood (but this is more an identifiability issue here).

Wasn’t this your motivation for Jeffrey’s prior as a way to improve the MLE for small sample sizes in kriging?

I am not sure that the MoM would perform better in these situations…