I want to estimate the parameters of a
Beta distribution from a sample based on maximum likelihood estimation (MLE). This distribution has four parameters : \alpha, \beta, a and b. The parameters a and b are bound parameters. In my case, these parameters are known a=0 and b=1, because my variable is a physical quantity between 0 and 1 (it is defined as a ratio where the denominator is greater than the numerator). The thing is that the sample contains an extreme realization “1”.
This can be computed this way:
import openturns as ot ot.RandomGenerator.SetSeed(1976) # Why not? # Get sample data = [0.2, 0.3, 0.5, 1.0] # Notice the "1.0" sample = ot.Sample([[x] for x in data]) # Fit : estimate all parameters (the wrong way) factory = ot.MaximumLikelihoodFactory(ot.Beta()) factory.setKnownParameter([0.0, 1.0], [2, 3]) beta = factory.build(sample) print(beta)
Beta(alpha = 2, beta = 2, a = 0, b = 1)
In other words, the MLE failed: the optimization algorithm failed to maximize the likelihood. Notice that no exception was raised: it seems to me that it should be so. The reason for the issue is that the sample contains the value “1.0” (the fourth value in the sample). At this point, the log-PDF is -infinity, because the PDF is zero:
logpdf = ot.Beta(2.0, 2.0, 0.0, 1.0).computeLogPDF(sample) print("logpdf=", logpdf) likelihood = logpdf.computeMean() print("likelihood=", likelihood)
The previous script prints:
logpdf= 0 : [ -0.040822 ] 1 : [ 0.231112 ] 2 : [ 0.405465 ] 3 : [ -1.79769e+308 ] likelihood= [-4.49423e+307]
Indeed, since OT1.16, the log-PDF can go down to -1.e307, but not -INF. This is a good consequence of the highly discussed PR https://github.com/openturns/openturns/pull/1566 In OT 1.15, the likelihood is approximately equal to -708, which was a numerically wrong answer.
There are several ways to fix the issue. One solution would be to drop the “1.0” value in the sample. It makes sense, since this realization does not match the distribution.
Another solution is to slightly increase the bounds of the
delta = 0.01 logpdf = ot.Beta(2.0, 2.0, 0.0 - delta, 1.0 + delta).computeLogPDF(sample) likelihood = logpdf.computeMean()
The previous script prints:
logpdf= 0 : [ -0.0390172 ] 1 : [ 0.218678 ] 2 : [ 0.385662 ] 3 : [ -2.86287 ] likelihood= [-0.574386]
Now the likelihood is finite.
Therefore, one way to estimate the parameters of the
distribution = ot.Beta(1.0, 1.0, 0.0 - delta, 1.0 + delta) factory = ot.MaximumLikelihoodFactory(distribution) factory.setKnownParameter([0.0 - delta, 1.0 + delta], [2, 3]) beta = factory.build(sample)
However, this way is not completely satisfactory, because it depends on the \delta=0.01 parameter. What if \delta \rightarrow 0?
Is there a satisfactory way to perform this? It sounds to me that my problem is wrongly established, in the sense that the sample does not match the distribution.
As a side comment, if the likelihood was “-INF” in place of “-1.e308”, it would seem to me that the code would be more compliant to the IEEE754 standard (but this does not change the mathematical issue).
This situation is very similar to estimating the parameters of a
Trapezoidal distributions from a sample using the MLE.
A similar situation would occur with MCMC simulation, as pointed by @MerlinKeller and @josephmure.