Inconsistencies in OpenTURNS API


I found inconsistencies in OpenTURNS’ API and I wanted to share that information.

There are inconsistencies between two OpenTURNS classes with related purposes:

  • ProbabilitySimulationResult,
  • ExpectationSimulationResult.

I spotted two differences:

  • in the way we can manage design of experiments,
  • in the way to compute confidence intervals.

Differences in the design of experiments

The ProbabilitySimulationAlgorithm class can be built upon an Experiment():

experiment = ot.MonteCarloExperiment()
algo = ot.ProbabilitySimulationAlgorithm(event, experiment)

The only constructor of ExpectationSimulationResult is using a RandomVector:

algo = ot.ExpectationSimulationAlgorithm(random_vector)

and there is no way to use another type of design of experiments. It is Monte-Carlo period.

This is sad because we may use e.g. a LHS design or a randomized QMC. The two cases must, however, be handled differently. The LHS leads to an estimate of the mean which is asymptotically Gaussian. Increasing the sample size will reduce the length of the confidence interval. Therefore, the current algorithm works fine, without any change.

The RQMC design has a randomized part which works great with the algorithm, and a deterministic part for which the algorithm will fail. I do not know how we may manage that at the source code level: I do not have enough experience with OpenTURNS’s RQMC sampling. In theory, at least, it should do the trick. For example, by splitting n simulations in two parts n_{R} and n_{QMC} such that

n = n_R \times n_{QMC},

the Gaussian confidence interval should be based on n_R. This will improve the convergence accuracy thanks to the n_{QMC} part, meanwhile providing a confidence interval using the n_R part. The trade-off is the way to pick n_R and n_{QMC}, but this is another story.

Differences in the way to compute the confidence interval

There are inconsistencies in the way we compute a confidence interval.
With ProbabilitySimulationResult, here is a way to compute a level \alpha confidence interval:

alpha = 0.95
pf = result.getProbabilityEstimate()
pflen = result.getConfidenceLength(alpha)
    "%.2f%% confidence interval = [%f,%f]"
    % (alpha * 100, pf - pflen / 2.0, pf + pflen / 2.0)

With ExpectationSimulationResult, there is no getConfidenceLength() method, but we can use the sample mean distribution:

alpha = 0.95
sample_mean_CI = sample_mean_distribution.computeBilateralConfidenceInterval(alpha)
    "The probability that the interval %s" % (sample_mean_CI),
    "contains the true mean is equal to %.2f" % (100.0 * alpha),

There is, however, a significant difference between the two classes: the output of the ProbabilitySimulationResult is a probability, which can be stored as a float. This cannot be done with ExpectationSimulationResult, which estimates a mean and must be stored as a ot.Point().

Did you spot other differences? Do you agree on this analysis?