Normalization of the input sample in KrigingAlgorithm in OT1.16

Hi !
After a discussion with colleagues on the features of OT 1.16, I mentioned the important fact that the normalization of the inputs has been dropped:

That might have been overlooked by some users, and this is why I explain what is the issue here, and how we may solve it.

What this line means is that OT1.15 and previous versions scaled the input sample so that the mean was zero and the standard deviation was equal to 1:

\boldsymbol{z} = \frac{\boldsymbol{x} - \overline{\boldsymbol{x}}}{sd(\boldsymbol{x})}

where \boldsymbol{x} is the input point, \overline{\boldsymbol{x}} is the sample mean and sd(\boldsymbol{x}) is the sample standard deviation.
This computation was performed by default in the constructor of the GeneralLinearModelAlgorithm, the class which does the job behind the KrigingAlgorithm class.

That feature had good effects, because using unscaled input sample makes the learning process of the hyperparameters difficult in some cases. More precisely, it makes difficult to estimate the parameters by maximum likelihood optimization. It has, however, also poor side effects, e.g. the scale parameter of the covariance model is related to the scaled input sample, and not to the original sample anymore.

OpenTURNS 1.16 does not scale the input sample: this is the responsibility of the user. Without change, this may produce poor kriging metamodels because the scale parameter may be poorly estimated.

The code below presents one possibility to overcome this problem:

dimension = myDistribution.getDimension()
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SquaredExponential([1.0] * dimension, [1.0])
covarianceModel.setScale(X_train.getMax())  # Trick A
algo = ot.KrigingAlgorithm(X_train, Y_train, covarianceModel, basis)
scaleOptimizationBounds = ot.Interval(X_train.getMin(), X_train.getMax())
algo.setOptimizationBounds(scaleOptimizationBounds)  # Trick B
algo.run()

There are two tricks, both of which try to take into account for potentially very different magnitudes in the input vector X.

  • The trick A sets the scale parameter of the covariance model before creating the kriging algorithm. Indeed, this value is used as a starting point for the optimization algorithm. Hence, it is much more easy for the algorithm to find the optimum, because the starting point has now the correct order of magnitude (instead of using the default value of the scale parameter of the covariance model, which may be far away from the optimum).
  • The trick B sets the bounds of the optimization algorithm, so that they match the minimum and maximum of the sample. Hence, the optimization algorithm has reliable bounds to search for the scale parameter.

Using these trick is not necessary if the input sample already has a scale close to 1. For example, when approximating the function y=\sin(x) for x\in[0,1], this is not necessary. This is mandatory, however, in the cantilever beam example of the documentation, where the first parameter is the Young modulus, which has an order of magnitude equal to 10^9.

Using this particular normalization may work if the initial sample is large enough to represent the potential optimal value of the scale parameter. It may not work if the initial sample size is too small, e.g. lower than 10. This might be an issue for those of us who want to sequentially learn new points in the design of experiments.

More details and other implementations of scaling are provided at:

Please share any other scaling method that you commonly use or any other way to initialize the parameters of the covariance model for optimization. Please correct me if I did not properly explained the scaling issue here. In any case, I guess that this might be interesting for some who are surprised by changes in their kriging metamodels in OT 1.16.

Best regards,

Michaël

3 Likes