Hi,
I post below a summary of this morning’s discussion. I hope that this may allow to share the content and provide it for interested users.
The core topic is the evaluation of the PDF of a distribution which is defined on an interval, e.g. the Uniform distribution on [-1,1]. If x\not\in [-1,1], there are two approaches:
- return 0,
- generate an C++ exception.
The associated topic is the value of the LogPDF. If x\not\in [-1,1], there are two approaches:
- return
Specfun::LowestScalar
\approx -10^{308}, - return
-INF
.
In the current API, the method Distribution.getRange().contains(x)
returns True
if x
is in the range.
There are several places in the code where this issue appears.
- Several implementations of the PDF are based on the exponential of the log-PDF, e.g. NormalGamma:
- For distributions which are defined on an interval, the Log-PDF is currently set to
LowestScalar
when the input is not in the interval, e.g. Beta:
- The
RandomWalkMetropolisHastings::getRealization
evaluates the Log-PDF of the prior:
-
DistributionImplementation::computeMinimumVolumeLevelSetWithThreshold
is based on the Log-PDF too:
There are several applications of the management of INF and NANs in the library:
- The MCMC algorithm, a topic partially solved by Correct the formula of MCMC::computeLogLikelihood [1.16] by josephmure · Pull Request #1648 · openturns/openturns · GitHub
- The maximum likelihood estimation (see Estimating parameters of a bounded distribution with MLE? for a related topic)
Several Bayesian algorithms are planned to be developed, which may benefit from this management:
- A Metropolis-Hastings algorithm, independent from the current bayesian calibration context,
- A bayesian calibration algorithm, based on the “new”
CalibrationResult
object.
What was unclear, however, is how these new algorithms may benefit from a change in the management of INF/NAN, in the sense that the current state would not forbid these developements (see below for another point of view).
Based on these informations, 4 options were discussed.
- Option A: manage -INF everywhere necessary
- Create a new Specfunc::Infinity and set it with -INF
- This corresponds to updating and merging the experimental branch : GitHub - josephmure/openturns at infinity
- This option uses floating point arithmetic to manage a point which is not in the range.
- One issue may be that optimization solvers such as
Cobyla
may not manage consistently such a situation. For example, an infinite loop was found when estimating the three parameters of aTriangular
distribution with MLE. It is likely (!) that the bound parameters were not managed correctly by the algorithm.
- Option B: generate exceptions where necessary
- This would require to throw exceptions wherever the code currently returns
LowestScalar
- The exception would be
InvalidArgumentException
- This requires changes in all functions which currently use
exp(logPDF)
to catch the exception, if necessary.
- This would require to throw exceptions wherever the code currently returns
- Option C: a new
computeLogPDFUnsafe()
- This new
computeLogPDFUnsafe()
method of anyDistribution
would return-INF
when the input point is not in the range. - This would also allow to manage exceptional parameters of the distribution, e.g. \alpha=10^{50} in a
Beta
distribution. - The current behaviour of
computeLogPDF()
would be unchanged. - The new
computeLogPDFUnsafe()
would be used in MCMC, and in any algorithm which would need it. - If a
Sample
contains aPoint
which is not in the range,computeLogPDFUnsafe()
would allow to search for the index in theSample
; an exception would not allow this.
- This new
Three variants of the option C were discussed, but I am unfortunately unable to describe them .
Feel free to comment these options or suggest new ones.
Best regards,
Michaël