Updating an existing LH sample

Hi,
I’ve started working with your package with the aim to explore the use of metamodels for civil eng applications. Therefore I’m quite new to the topic and it’s my first time working on this probability methods.
Indeed I presume my problem is quite simple. I’m planning the definiton of a sampling to run some FE simulations and then try to build metamodel (i.e. PCE to start) to estimate some simulation outcomes. The problem has 5 parameters varying within certain ranges with a random distribution. It seems to me the LHS method perfectly fit the scope, but my trouble now is on the number of samples to create.
Let say I initially opt for 100 data for the sample, but the final result of the analysis does not provide me a robust prediction. Then, I need to create a new sample and suppose I want to add other 100 input.
The sum of the two samples will not be LH. Am I right? Is there any particolar solution/suggestion you can propose me?

Hello,

to enrich a sampling without complex algorithm, I use Sobol’ design of experiments.

experiment = ot.LowDiscrepancyExperiment(
    ot.SobolSequence(), composedDistrib, sample_size
)
sample = sample.generate()

Sobol’ sample is an ordered sequence. If you generate 1000 points, you can begin with experiment 0 to 99, for instance, get your 100 outputs, and see wether it’s ok or not. If it’s not, you can use the following 100 experiments, and step by step, converge to a good result. But, you have to keep the order. It ensures you each subsample will fill the design space uniformly, without any repetition of the same experiment (draw the sample before, with different colors for each group of 10 or 100 to see the different subsamples).

You can also use other similar sequences, like Halton, etc. See Designs of experiments — OpenTURNS 1.17dev documentation

Other users may also propose other solution, but this one is, I think, pretty simple.

Regards

1 Like

Hi!
Your answer seems excellent to me. Notice that the points in a Sobol’ sequence are better used if the number of points in the sample is equal to a power of 2. This is because a Sobol’ sequence is based on base 2 decomposition of integers, in each dimension. Hence, the sequence fills elementary intervals which are defined in base 2.

This is perhaps better seen in the graph:

taken from:

http://openturns.github.io/openturns/master/auto_reliability_sensitivity/design_of_experiments/plot_design_of_experiments.html#sobol-low-discrepancy-sequence
I paste below the comment:“We have elementary intervals in 2 dimensions, each having a volume equal to 1/8. Since there are 32 points, the Sobol’ sequence is so that each elementary interval contains exactly 32/8 = 4 points. Notice that each elementary interval is closed on the left (or bottom) and open on the right (or top).”.

The previous figure only presents one particular choice of elementary intervals: there are many of them, which explains the performance of Sobol’ sequences and, more generally, of low discrepancy sequences.

In order to apply this idea to your situation, I would suggest to create a sample with size 1024, divided in 8 smaller DOEs with size 128, since 128 \times 8 = 1024. Each time the sample is to be increased, it should be increased by multiplying its size by 2. This would allow to get:

  • one sample with size 128 (points from 1 to 128),
  • one sample with size 256 (with 128 new points with indices from 129 to 256),
  • one sample with size 512 (with 256 new points with indices from 257 to 512),
  • one sample with size 1024 (with 512 new points with indices from 513 to 1024).

This is a nested hierarchy of low discrepancy sequences (L.D.S.), each one being a consistent Sobol’ sequence based on a base 2 sample size.This is how I test the accuracy of sensitivity indices based on L.D.S…

Regards,

Michaël

PS

More details on this topic are presented in:
https://www.researchgate.net/publication/351037438_Low_Discrepancy_Toolbox_Manual
If your are interested on the topic, I suggest this one in French: Thiémard, Eric. Sur le calcul et la majoration de la discrépance à l’origine . EPFL, 2000.

Many thanks for your precious advices. It is clearly what I was looking for!
Honestly, I cannot understand how to divide the sample with 1024 data in the 8 smaller groups (basically they are 4 groups: two of 128, one of 256 and one of 512, isn’t it?) as suggested by Michaël, but I will have a look on your documentation to find out how it works. As said, I’m quite new to these and still not familiar with all the available option.
Thanks again!
Best regards