SubsetSampling intermediary quantiles estimation

Hi everyone,

I recently started using the OpenTURNS implementation of the Subset sampling algorithm to estimate rare event probabilities. As described in the documentation example, I tried to isolate the different subset samples to generate nice plots. In this process, I also recomputed the intermediary quantiles returned by the accessor “getThresholdPerStep()”. In the following minimalist example, you will notice that the estimated values are different (intermediary quantiles returned by “getThresholdPerStep” vs. recomputed quantiles on samples from the algorithm).

import openturns as ot
## Problem definition
X = ot.RandomVector(ot.Normal([0.25] * 2, [1] * 2, ot.IdentityMatrix(2)))
g = ot.SymbolicFunction(["x1", "x2"], ["20-(x1-x2)^2-8*(x1+x2-4)^3"])
g = ot.MemoizeFunction(g)
Y = ot.CompositeRandomVector(g, X)
myEvent = ot.ThresholdEvent(Y, ot.LessOrEqual(), 0.0)
## Subset sampling
algo = ot.SubsetSampling(myEvent)
algo.setKeepEventSample(True)
algo.run()
inputSampleSubset = g.getInputHistory()
outputSampleSubset = g.getOutputHistory()
result = algo.getResult()
proba = result.getProbabilityEstimate()
print("Estimated failure probability = {:.2e}".format(proba))
levels = algo.getThresholdPerStep()
N = algo.getMaximumOuterSampling() * algo.getBlockSize()
Ns = algo.getStepsNumber()
## Retrieve subset samples from Memoize function
list_subSamples = list()
list_subSamples_outputs = list()
for i in range(Ns):
    list_subSamples.append(inputSampleSubset[i * N: i * N + N])
    list_subSamples_outputs.append(outputSampleSubset[i * N: i * N + N])
alpha = algo.getConditionalProbability()
quantiles = [ss_output.computeQuantile(alpha)[0] for ss_output in list_subSamples_outputs]
print("Quantiles estimated on subset samples: \n", ot.Point(quantiles))
print("Quantiles from SubsetSampling accessor: \n", levels)

I understand why the two computations of the last quantile are different but not the intermediary ones. Is it coming from the use of the MemoizeFunction to retrieve the algorithm’s samples? Thanks for your help!

Elias

Hello,

I also used the history sample to draw such graphs and for almost the same reason I would like to have access to the real sample used to compute the probabilities. Indeed due the Markov Chain process, the computed realization is not always selected as point to compute the intermediate probabilities, I think this is why there is such a difference.

Openturns provides 2 methods called getEventInputSample and getEventOutputSample, however it only returns the last step sample. Maybe we could suggest to return the full sample or a collection of sample.

Antoine

Hi @dumas,

Thanks for your answer. To be honest, I didn’t get the point about your sentende “indeed due to the Markov Chain process […] intermediate probabilities”. Could you please explain us a little bit more what you have in mind?

Thanks for your help,
Regards,
Vince

When I looked into the SubsetSampling class in the function generatePoints, there is this piece of code:

    Sample blockSample(standardEvent_.getFunction()(inputSample));

    for (UnsignedInteger j = 0; j < getBlockSize(); ++ j)
    {
      // 2. accept the new point if in the failure domain
      if (getEvent().getOperator()(blockSample(j, 0), threshold))
      {
        currentPointSample_[i * blockSize + j] = inputSample[j];
        currentLevelSample_[i * blockSize + j] = blockSample[j];
      }
    }

The function evaluates a new input sample but all realizations are not accepted for the evaluation of the the conditional probability of failure. This is why there is a difference when you get history from the function.
I did not try to modify OT but on my opinion it would be nice to get currentPointSample and currentLevelSample.

I modified subset to add accessors to the samples at each step

3 Likes