Impossible truncature degree for Functional Chaos

Hello,

I am currently implementing a Functional Chaos (PCE) metamodel. I have 3 input parameters and around 800 samples. According to the rule of thumbs to avoid matrix singularities, the truncature degree should not be larger than 11.
Still, while running the following routine with larger degrees, no error arises. I am even able to compute the predictivity factor Q2 and the LOO error, see below.

indexMax = enumerateFunction.getStrataCumulatedCardinal(degree)
adaptiveStrategy = ot.SequentialStrategy(productBasis, indexMax)
sampling_size = 800
experiment = ot.MonteCarloExperiment(sampling_size)
projectionStrategy = ot.ProjectionStrategy(ot.LeastSquaresStrategy(experiment))
pce_algorithm = ot.FunctionalChaosAlgorithm(input_sample_training_rescaled, 
    self.output_sample_training, distribution, 
    adaptiveStrategy, projectionStrategy)

I guess that something is happening in the adaptativeStrategy or in the projectionStrategy but I can’t figure out what it is.
I first supposed that these methods already accounted for the rule of thumbs and automatically changed the degree for the maximal possible degree for the PCE, but I guess that if it were the case, Q2 and the LOO error should not change after a certain value of degree (close to 11) since the metamodel would be identical.

Thanks in advance for your insights,
Sarah

Hi,

It depends on the method you use to create the polynomial chaos. In this case, you compute the coefficients using LeastSquares and use the SequentialStrategy as a selection method. This method uses all the coefficients it can. The number of coefficients in the basis is computed by your script using the getStrataCumulatedCardinal method of the enumeration function. In this particular case, I think that there is no difference with FixedStrategy, i.e. setting the functional basis in advance.

I initially thought that you used the LARS selection method, which is standard in that situation. This what I did for the Ishigami function in selection-degree-chaos-ishigami.ipynb using LeastSquaresMetaModelSelectionFactory. I got this:
image

But this is a little different from your script, since you use the LeastSquaresStrategy(experiment) constructor. My best guess is that the behavior here is better explained by @regislebrun 's message below, i.e. how the coefficients are actually computed using least squares.

Regards,
Michaël

Hi Sarah,
I can only guess as your script is incomplete, but based on the part you provided, the computation of the coefficient is done internally by the PenalizedLeastSquaresAlgorithm class. In the case where the design matrix has more columns (it means more function) than rows (the number of points) the resolution is done using a QR decomposition, and among the infinite set of possible solutions (as you noted) it picks the one with the smallest norm. It has no specific meaning in terms of approximation but allows to still give a solution.
If you give me a complete version of the script I can give you a more detailed analysis

Cheers
Régis

Hi,

I worked on this topic, and wanted to see how I may be able to reproduce your experiments.

In your script, I notice that you use the variable name input_sample_training_rescaled. I do not know what this means exactly, but I guess that you scaled in the inputs into the unit cube [0,1]^p, where p is the dimension. In general, this is not necessary, because PCE uses standardized variables anyway. This is straightforward in your example, since you provide the distribution as input argument. In other words, scaling the inputs seems useless to me in this case.

I wanted to see how the number of coefficients increases in this example. According to [1] page 34, eq.2.53, the number of coefficients of a PCE with total degree d and p input variables is:

\textrm{Card}(\mathcal{J}) = \binom{p + d}{d}

where the binomial coefficient is:

{p + d \choose d} = \frac{(p + d)!}{p! d!}.

The following script computes the number of coefficients when the dimension is equal to p = 3.

dimension = 3
# Plot
degree_maximum = 15
degree_vs_coeffs = np.zeros((degree_maximum, 2))
for totalDegree in range(1, 1 + degree_maximum):
    degree_vs_coeffs[totalDegree - 1, 0] = totalDegree
    degree_vs_coeffs[totalDegree - 1, 1] = ot.SpecFunc_BinomialCoefficient(
        dimension + totalDegree, totalDegree
    )

I got this figure:

image

This shows that the polynomial degree required to get more than 800 coefficients is 15, which is a little larger than the polynomial degree 11 you mentioned.

I wondered the number of coefficients that is obtained when we use a full polynomial chaos with total degree 50, as you used in your simulation. Increasing the maximum polynomial degree up to 50 and using log-scale for the Y axis produces the following figure.

image

This shows that for a model with p = 3 inputs, the total polynomial degree d = 50 produces more than 10^4 coefficients. This seems much larger than usual.

This indicates the number of coefficients of the full (non sparse) PCE, but does not indicate the reduction in the number of coefficients of the sparse PCE with LARS selection method. So I created a sparse PCE with LARS selection method and least squares, and counted the number of coefficients actually in the selected basis. I compared that with the number of coefficients of the full PCE. For this experiment, I used a training size equal to 800 and a simple Monte-Carlo DOE.

image

We see that the sparse PCE drastically reduced the number of coefficients.

In order to reproduce your results, I performed the same experiment as before, using full PCE this time and with maximum polynomial degree equal to 12. This produces the following figure:

image

I was surprised to see that the Q2 coefficient is much better as I would guess: the PCE performs rather well on this point of view. Looking more closely to the results, I was surprised that the coefficients are rather accurate with the full PCE. It is difficult to exhibit overfitting in this case, perhaps because the training sample is quite large: with polynomial degree equal to 12, we have approximately 400 coefficients to estimate with 800 points in the training DOE. This seems to be more than enough for the Ishigami test function.

I compared the time required to produce the previous figure with two different PCE decompositions:

  • a sparse PCE using least squares and LARS selection method : 41 seconds,
  • a full PCE using least squares : 2 min 34 seconds.

Therefore, to create a full PCE with total degree up to 50, you were able to compute more than 10^4 coefficients, which must require much more than 10 minutes, isn’it? What CPU / wall clock time was required to produce the figure you showed in your message?

Best regards,

Michaël


  1. Le Maître, O. and Knio, O. (2010). Spectral Methods for Uncertainty Quantification with Applications to Computational Fluid Dynamics. Springer Series Scientific Computation. ↩︎

Hello Michael and Regis,
Thank you so much for your kind and complete answers! It helped a lot!
Regarding the computation time, it was between one and two hours for the largest degrees. Quite long indeed, but still much less than the model I was approximating with PCE!
Regards,
Sarah