Should methods be allowed to output infinity?

Hi,

I post below a summary of this morning’s discussion. I hope that this may allow to share the content and provide it for interested users.

The core topic is the evaluation of the PDF of a distribution which is defined on an interval, e.g. the Uniform distribution on [-1,1]. If x\not\in [-1,1], there are two approaches:

  • return 0,
  • generate an C++ exception.

The associated topic is the value of the LogPDF. If x\not\in [-1,1], there are two approaches:

  • return Specfun::LowestScalar \approx -10^{308},
  • return -INF.

In the current API, the method Distribution.getRange().contains(x) returns True if x is in the range.

There are several places in the code where this issue appears.

  • Several implementations of the PDF are based on the exponential of the log-PDF, e.g. NormalGamma:

image

  • For distributions which are defined on an interval, the Log-PDF is currently set to LowestScalar when the input is not in the interval, e.g. Beta:

image

  • The RandomWalkMetropolisHastings::getRealization evaluates the Log-PDF of the prior:

image

  • DistributionImplementation::computeMinimumVolumeLevelSetWithThreshold is based on the Log-PDF too:

There are several applications of the management of INF and NANs in the library:

Several Bayesian algorithms are planned to be developed, which may benefit from this management:

  • A Metropolis-Hastings algorithm, independent from the current bayesian calibration context,
  • A bayesian calibration algorithm, based on the “new” CalibrationResult object.

What was unclear, however, is how these new algorithms may benefit from a change in the management of INF/NAN, in the sense that the current state would not forbid these developements (see below for another point of view).

Based on these informations, 4 options were discussed.

  • Option A: manage -INF everywhere necessary
    • Create a new Specfunc::Infinity and set it with -INF
    • This corresponds to updating and merging the experimental branch : GitHub - josephmure/openturns at infinity
    • This option uses floating point arithmetic to manage a point which is not in the range.
    • One issue may be that optimization solvers such as Cobyla may not manage consistently such a situation. For example, an infinite loop was found when estimating the three parameters of a Triangular distribution with MLE. It is likely (!) that the bound parameters were not managed correctly by the algorithm.
  • Option B: generate exceptions where necessary
    • This would require to throw exceptions wherever the code currently returns LowestScalar
    • The exception would be InvalidArgumentException
    • This requires changes in all functions which currently use exp(logPDF) to catch the exception, if necessary.
  • Option C: a new computeLogPDFUnsafe()
    • This new computeLogPDFUnsafe() method of any Distribution would return -INF when the input point is not in the range.
    • This would also allow to manage exceptional parameters of the distribution, e.g. \alpha=10^{50} in a Beta distribution.
    • The current behaviour of computeLogPDF() would be unchanged.
    • The new computeLogPDFUnsafe() would be used in MCMC, and in any algorithm which would need it.
    • If a Sample contains a Point which is not in the range, computeLogPDFUnsafe() would allow to search for the index in the Sample ; an exception would not allow this.

Three variants of the option C were discussed, but I am unfortunately unable to describe them :slight_smile: .

Feel free to comment these options or suggest new ones.

Best regards,

Michaël

1 Like