PCE study with inputs distributions

Hi Flore,
Sorry for not replying soon enough. You obviously get very good results here, whatever the choice of the method.

I suggest to add a new method to your benchmark, based on the ComputeSparseLeastSquaresChaos() function presented at:

https://openturns.github.io/openturns/latest/auto_meta_modeling/polynomial_chaos_metamodel/plot_chaos_beam_sensitivity_degree.html

The ComputeSparseLeastSquaresChaos() function creates a sparse polynomial chaos using regression and the LARS selection method. It does not always work as is, but is very effective in many cases. In order to get sparsity, notice that the script uses the LeastSquaresMetaModelSelectionFactory class (to select LARS), and uses the result to create the projectionStrategy using the LeastSquaresStrategy class (to select regression).

I use the message posted at Issue 1394 with extra comments. One of the keys to get a good polynomial chaos expansion is to create a sparse expansion. There are currently 3 ways to get sparsity in the library.

  • the LeastSquaresMetaModelSelectionFactory class is a way to get sparsity from the LARS selection method,
  • the AdaptiveStrategy class selects the significant coefficients in the decomposition,
  • the last of the 3 ways of creating an enumerate function, the HyperbolicAnisotropicEnumerateFunction, is another way of getting sparsity.

Let us focus on the selection methods in AdaptiveStrategy.

  • The FixedStrategy uses a given number of coefficients. This creates a full PCE.
  • The SequentialStrategy uses a method to sequentially select the significant coefficients.
  • Once the coefficients are computed, the CleaningStrategy removes coefficients which are, e.g., smaller than a threshold.

The SequentialStrategy is mainly a proof of concept at this time (please read in PS for details). Its current implementation cannot create a sparse PCE.

Depending on its parameters, the CleaningStrategy can create a sparse PCE. It can work pretty well, as shown in the create a sparse chaos by integration example. The first parameter maximumDimension, is the number of candidate coefficients considered by the algorithm. I think that there is a typo in your script: you wrote maximumBasisSize, but this is maximumDimension. Your particular way of using the class is too generous: the variable maximumDimension is computed using getStrataCumulatedCardinal() in your script. Hence, the variable maximumDimension contains the number of coefficients associated to the given maximum degree: this can be large when the polynomial degree gets large. You can compute the number of coefficients using the binomial coefficients (this is explained here). The other two parameters are (following the help page) can help to produce a good metamodel:

  • maximumSize. This is the number of coefficients kept in the sparse PCE, i.e. in the active basis. Its default value is set to 20.
    significanceFactor. This is a parameter which tunes the criteria to select when a candidate coefficient enters into the active basis. Its default value is equal to 10^{-4}.

The default values of the parameters explain why you get good Q² score. The very large value of maximumBasisSize you used explains why the script is sooo long.

Let us analyse the results from your benchmark.

  • FixedStrategy. This strategy creates a full PCE. This increases the number of coefficients relatively fast. When the number of coefficients gets too large, they get estimated poorly and the Q² gets negative.
  • SequentialStrategy. This creates a full PCE because of the current implementation. You did not show that particular script, so it is difficult to say why it is so fast. The results are not good in this case.
  • CleaningStrategy. This may create a sparse PCE in general. In your benchmark, the Q² score is good because the algorithm creates a sparse basis, but it is too long because the functional basis you creates is extremely large.

Regards,

Michaël

PS
The implementation of the SequentialStrategy is as follows: a candidate coefficient is added to the active basis if it reduces the residuals. But the residuals necessarily decreases when we add a new coefficient to the basis. Hence, the SequentialStrategy ends when all coefficients are added, up to the point where there is no new candidate : the PCE is therefore full. According to @regislebrun , this class was essentially created at an early point of the developement of the library, in order to show that this was a feasible option.