Difference observed on trend coefficients between OPENTURNS and DiceKriging

Hello

My team did some training challenge on the data attached to this post. The goal is to learn the best Kriging model. We noticed a difference between the result provided by OpenTurns and by DiceKriging (R). When covariances parameters are fixed for both software and the same trend formula is used, we don’t obtain the same trend coefficients. As a result, the Q2 LOO measure obtained with OpenTurns is worse than the one with DiceKriging.
I can’t set the trend coefficient in OpenTurns so I can’t go further in my analysis, is it possible to have some help on that issue ?

SCRIPT PYTHON

"""
Exemple pour comparaison OPENTURNS et DICEKRIGING
durantin | 24/11/2020 | V0
"""
import numpy as np
import openturns as ot
import matplotlib.pyplot as plt
from sklearn import preprocessing
#%% ##################################################
#### Lecture des données et stockage
######################################################
X = np.loadtxt('X.txt')
Y = np.loadtxt('Y.txt')
#%% #########################################################
#### Apprentissage du Processus Gaussien conditionné 
#############################################################

DimX = 6

#### Définition de la tendance degrée 2 avec seulement les intéractions:
inputName = []
for i in range(DimX) :
    inputName.append('X'+str(i+1))
functionBase = []
functionBase.append(ot.SymbolicFunction(inputName,['1']))
for i in range(DimX):
    functionBase.append(ot.SymbolicFunction(inputName,[inputName[i]]))
for i in range(DimX):
    for j in range(i,DimX):
        if (i!=j) :
            functionBase.append(ot.SymbolicFunction(inputName,[inputName[i]+'*'+inputName[j]]))

basis = ot.Basis(functionBase)
####  Définition de la covariance
covarianceModel = ot.MaternModel([0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099],[np.sqrt(6.589306)], 5./2)
####  Apprentissage 
algo = ot.KrigingAlgorithm( ot.Sample(X), ot.Sample(np.atleast_2d(Y).T), covarianceModel, basis, False )
algo.setOptimizeParameters(False)
algo.run()
result = algo.getResult()

print(result.getTrendCoefficients())
''
 6.47196,-13.4897,13.7795,-19.0711,18.2367,19.2923,
-13.948,9.37527,8.03541,-9.24838,10.8714,5.55679,
3.42253,-17.9401,-11.1812,-1.26304,0.826423,17.2884,
5.91668,-23.7892,19.2513,-16.5288
''

####  Calcul du Q2 par LOO
Covmat = np.array(result.getCovarianceModel().discretize(X))
Fmat = preprocessing.PolynomialFeatures(degree = 2, interaction_only = True, include_bias = True).fit_transform(X)
dim0 = Fmat.shape[1]
Smat = np.vstack( (np.hstack((Covmat,Fmat)), np.hstack((Fmat.T,np.zeros([dim0,dim0]))) ) )
SmatInv = np.linalg.inv(Smat)
SmatInvLimited = SmatInv[range(X.shape[0])][:,range(X.shape[0])]
SmatInvDiag = np.diag(SmatInvLimited)
SmatInv0 = SmatInvLimited - np.diag(SmatInvDiag)

muLOO = -np.divide(np.dot(SmatInv0,Y),SmatInvDiag)   

Q2 = 1 - (np.mean((muLOO-Y)**2)/np.var(Y))

fig, ax = plt.subplots()
ax.scatter( Y, muLOO )
ax.plot( [Y.min(), Y.max()], [Y.min(), Y.max()], 'k--', lw=4)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')    
ax.set_title('Q2 = '+str(Q2))
```
**SCRIPT R**

> require(DiceKriging)
> 
> X = read.table("X.txt");
> X=cbind(X$V1,X$V2,X$V3,X$V4,X$V5,X$V6)
> Y1 = read.table("Y.txt")$V1
> 
> 
> N = dim(X)[1]
> var1 = 6.589306
> theta1 = c( 0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099)
> 
> model1 = km(~.^2,design = X,response = Y1,nugget.estim = F,
> coef.var = var1, coef.cov = theta1)
> 
> errLOO = matrix(0,N,1)
> for(i in 1:N){
>   modeltemp_LOO1<-km(~.^2,design=X[-i,],response=data.frame(z=Y1[-i]),
>     coef.var = var1, coef.trend = model1@trend.coef, coef.cov = theta1, nugget.estim = F)
>   errLOO[i,]=predict(modeltemp_LOO1,newdata=matrix(X[i,],nrow=1),type="UK",checkNames=F)$mean
> }
> 
> Q2=1-mean((Y1-errLOO)^2)/var(Y1)
> 
> print(Q2)
> 
> print(model1@covariance@range.val)
> print(model1@covariance@sd2)
> print(model1@trend.coef)
> ##### [1]   6.2872398 -13.6359917  14.0025015 -19.0124666  18.1585121  19.6508349
> ##### [7] -13.8427035   9.6053325   8.0803512  -9.0787169  10.8451498   5.4669461
> ##### [13]   3.3470293 -18.0829504 -11.5673840  -1.2309130   0.8232995  17.3156215
> ##### [19]   5.7652294 -23.9570578  19.4113520 -16.5715691
> 
> Covmat = t(model1@T) %*% model1@T

Data X :
X.txt (13.6 KB)

Data Y :
Y.txt (2.3 KB)

Hello @cdurantin, thanks for posting here.

I do not understand your script: the part below “Calcul du Q2 par LOO” is completely independent from the estimation of the trend coefficients. It only depends on the OpenTURNS part of the script through

Covmat = np.array(result.getCovarianceModel().discretize(X))

But the output of result.getCovarianceModel() does not depend on the estimation of the trend parameters because you told the Kriging algorithm it must not estimate the covariance parameters:

algo.setOptimizeParameters(False)

In order to compute the LOO Q2 more easily, you could use the MetaModelValidation class within a for loop. At every step, you would exclude one data point from the train set and use MetaModelValidation on that point.

Je me permet de continuer la discussion en français pour essayer d’être le plus clair possible.

  • Calcul Q2 sans boucle

Le calcul du Q2 est là juste pour se donner une idée de la précision du métamodèle. La formulation analytique se trouve dans la thèse de Dubourg (https://tel.archives-ouvertes.fr/tel-00697026) page 38 eq(1.132). Cela permet d’éviter de boucler sur le nombre de points d’entrainement et donc d’être plus efficace numériquement. L’équation peut se développer autrement et cela fait apparaitre les coefficients de la tendance.

  • Calcul Q2 avec boucle

Si on boucle avec MetaModelValidation, les coefficients de la tendance ne seront pas fixés. J’ai fait le test, result.getTrendCoefficients() change de valeurs donc on n’utilise pas exactement le même métamodèle pour prédire à chaque fois le point exclut. Dans ce cas on ne calcul pas correctement le Q2 par LOO.

Pour repréciser le but de mon script exemple : il s’agit de montrer qu’on a une différence sur les coefficients de la tendance entre OpenTurns et DiceKriging lorsqu’on fixe les paramètres de la covariance. Pourtant, ces coefficients ne dépendent que de la covariance et sont calculables analytiquement (solution des moindres carrés généralisés):

\tilde{\beta} = \left(F^T R^{-1} F \right)^{-1} F^T R^{-1} y

Avec la valeur des coefficients de la tendance estimés par OpenTurns, j’obtiens une moins bonne précision dans la prédiction qu’avec DiceKriging. Vu que je ne peux pas choisir de fixer \beta à une certaine valeur, je ne peux pas pousser plus loin mon analyse… Déjà j’ai remarqué que les faibles valeurs de la matrice de covariance ne sont pas exactement les mêmes entre DiceKriging et OpenTurns. Ce qui peut expliquer la différence ensuite sur \beta.
Du coup je me demande si je n’utilise pas OpenTurns d’une mauvaise façon. Si mon script d’apprentissage est bon, peut-on analyser mon exemple pour comprendre la différence observée ?
N’hésitez pas à me solliciter pour tout autre précision.

Hello,

Change the used covariance model.

Try

coll = [ot.MaternModel([scale], 5./2) for scale in  [0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099]]
covarianceModel = ot.ProductCovarianceModel(coll)
covarianceModel.setAmplitude([np.sqrt(6.589306)])

instead of

covarianceModel = ot.MaternModel([0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099],[np.sqrt(6.589306)], 5./2)

You can also add this

ot.ResourceMap.SetAsBool(
    "GeneralLinearModelAlgorithm-UseAnalyticalAmplitudeEstimate", False)

such as model does not estimate again the amplitude (\sigma)

Model you fixed is different in comparison with R. Reason is hidden in here

Doing this, I get the following coefficients

[[6.28724,-13.636,14.0025,-19.0125,18.1585,19.6508,-13.8427,9.60533,8.08035,-9.07872,10.8451,5.46695,3.34703,-18.083,-11.5674,-1.23091,0.823299,17.3156,5.76523,-23.9571,19.4114,-16.5716]#22]```
which seem similar to yours with `DiceKriging`. However `Q2` is bad and did not investigate yet the reason.

Hope this helps.

Cheers
1 Like

Hello Sofiane,

Thanks a lot, it works : I obtain the same trend coefficients as DiceKriging.
Do you have any hints on how to set the trend coefficients in OpenTurns ?
Because on my problem’s data, I have trouble computing the Q2 predictivity factor.
I implemented 4 differents way of computing it and I have 3 differents results.
Two methods are using different implementation of Dubrule LOO formula (they give the wrong answer because of numerical errors in matrix multiplication).
On method is with MetaModelValidation but I’m certainly using it wrong because trend coefficients are re-estimated when i’m removing a point in my dataset.
The last method, which is the reference value, is by re-writting the prediction of the kriging conditionnal mean :

Extraction des données du métamodèle

beta = np.array(result.getTrendCoefficients()).T
Covmat = np.array(result.getCovarianceModel().discretize(X))
Fmat = preprocessing.PolynomialFeatures(degree = 2, interaction_only = True, include_bias = True).fit_transform(X)
eigenR = np.linalg.eig(Covmat)
Rinv = np.matmul( np.matmul(eigenR[1], np.diag(1/eigenR[0])), eigenR[1].T )

LOO par calcul explicite de la prédiction

muDOE = np.matmul(Fmat, beta)
looPred = np.zeros(nX)
for i in range(nX):
Fmati = preprocessing.PolynomialFeatures(degree = 2, interaction_only = True, include_bias = True).fit_transform(np.atleast_2d(X[i,:]))
Yi = np.delete(Y,i,0)
muDOEi = np.delete(muDOE,i,0)
Comati = np.delete(Covmat,i,0)
Comati = np.delete(Comati,i,1)
covX = Covmat[:,i]
covX = np.atleast_2d(np.delete(covX,i,0)).T
looPred[i] = np.matmul(Fmati, beta) + np.matmul(np.add(Yi, - muDOEi.T) , np.matmul( np.linalg.inv(Comati) , covX) )

Q2_1 = 1 - (np.mean((looPred-Y)**2)/np.var(Y))

One of Dubrule’s formula

FRinv = np.matmul(Fmat.T,Rinv)
Q = Rinv - np.matmul( np.matmul(Rinv, Fmat), np.matmul( np.linalg.inv(np.matmul(FRinv,Fmat)) , FRinv) )
Qinv = 1/np.diag(Q)
muLOOY =1/np.diag(Q) * np.matmul(Q,Y)

Q2_2 = 1 - (np.mean(muLOOY**2)/np.var(Y))

MetaModelValidation

residuals = np.zeros(nX)
for i in range(nX) :

algo = ot.KrigingAlgorithm( ot.Sample(np.delete(X,i,0)), ot.Sample(np.atleast_2d(np.delete(Y,i,0)).T), covarianceModel, basis, False )
algo.setOptimizeParameters(False)
algo.run()
result = algo.getResult()
metaModel = result.getMetaModel()
print(result.getTrendCoefficients())
val = ot.MetaModelValidation(ot.Sample(np.atleast_2d(X[i,:])), ot.Sample(np.atleast_2d(Y[i]).T), metaModel)
residuals[i] = np.array(val.getResidualSample())

Q2_3 = 1 - (np.mean(residuals**2)/np.var(Y))

1 Like

Now that it is possible to upload files, I have updated the data post.
I can also share the best Kriging obtained on my Data.
This is the “standard” procedure I use to learn a gaussian process on any dataset.
I might help other users or give you a view of how users use your software.
There is certainly room for improvement so don’t hesitate to comment.

BestMetamodel.py (3.9 KB)

1 Like

Thanks a lot for sharing this @cdurantin! I think it could be made into a great tutorial (perhaps with a few additional explanations :wink: ). I have a few questions :

  • The script starts with a LASSO regression, but it doesn’t look like the result is reused later. Did you try to perform UniversalKriging with the restricted functional trend basis instead of the full one?
  • In your experience, is the log-likelihood function often heavily multimodal? I ask this because you used an optimal LHS multistart with a large number of starting points.
  • The matrix Rinv is not used in the script. You might be numerically better off using the OT native method CovarianceMatrix.computeCholesky() anyway.

Would you be interested in contributing a tutorial? Or perhaps giving a talk at a User’s day?

About the issue with OpenTURNS recomputing trend coefficients, perhaps this example might help:

OTKriging_LOO.py (4.9 KB)

[Warning: I am using OpenTURNS version 1.16, which no longer accepts the normalize boolean in the KrigingAlgorithm constructor. It now always behaves like it did in 1.15 when normalize was set to False.]

Here is the Q2 I got with the anisotropic geometric Matern 5/2 kernel you used in your original script (using the trend coefficients the KrigingAlgorithm fitted on the whole data set):

OpenTURNSQ2

The idea is that once you know the trend coefficients, you can “detrend” your output data Y and perform Simple Kriging. That way you trick OpenTURNS into not recomputing the trend coefficients.

I wonder if that is a sensible approach to Q2 computation in general, though. Shouldn’t you perform cross-validation by running the algorithm from scratch on the subset of your data? Otherwise you’re sort of still using the missing data.

1 Like

Yes sure I can contribute to any tutorial if it helps other users, or give a talk on “my way to exploit OT tools”.

  • I use the LASSO at ligne 49 : functionBaseSelected = list(functionBase[np.nonzero(LassoCoef)]). I deleted elements of my full basis.

  • On this kind of dataset if you launch multiple time the optimization, you don’t find the same correlation length everytime. One solution is to have a first good guess and perform only one local optimization from it. Here I tried multistart but it has not always converged to the good solution.

  • Thanks for the hint about CovarianceMatrix.computeCholesky(). One of my colleagues advised me to use eigenvectors to ensure the matrix symmetry.

For the Q2 computation, I like the “detrend” idea and I will try it.
Regarding the “use of missing data”, the purpose of the Q2 LOO is to evaluate the predictivity of the model whose parameters have just been estimated. We therefore place ourselves in the framework where all the estimated parameters are fixed. This approach is well reflected in the articles dealing with the subject. This is what is done when using Dubrule’s formulas.

Hi @josephmure you were too quick

I implemented the same idea. But got 81.9% instead of 82.45%

But I’m wondering why Q2 here should take into account var(y) instead of conditional variance?

something like

betaF = ot.LinearCombinationFunction(basis, result.getTrendCoefficients()[0])


nX = len(X)
residuals = np.zeros(nX)

# Remove basis ==> we have the trend
Yc = Y - np.array(betaF(X)).ravel()

for i in range(nX):
    algoi = ot.KrigingAlgorithm(np.delete(X, i, 0), np.atleast_2d(np.delete(Yc, i, 0)).T, covarianceModel, ot.Basis())
    algoi.setOptimizeParameters(False)
    algoi.run()
    resulti = algoi.getResult()
    metaModeli = resulti.getMetaModel()
    residuals[i] = (metaModeli(X[i])[0] - Yc[i])

Q2 = 1 - np.mean(np.square(residuals)/np.var(Y)) 

For info, I had an old script that performs the Virtual LOO that needs today to be update

See gist for more details
It similar to what is done before but improves a bit the computations using Cholesky /system solving

In the following script, I let OpenTURNS estimate all Kriging hyperparameters “natively” (no change to the optimization algorithm, no multistart) and put all monomials of order \leqslant 2 in the trend basis. The Q2 is computed manually with @sofianehaddad’s version of the “detrend Y” method (no algebraic formulas) and compared to the Q2 of @cdurantin’s best metamodel: it is slightly lower.

OTKriging_LOO_FullQuadraticBasis.py (5.3 KB)

BestModel_reconstructed

myModel_fullQuadraticBasis

It would be interesting to do the Q2 compution with @sofianehaddad’s gist. In fact, I wonder if this method should not be implemented in OpenTURNS. Couldn’t it work as a method of KrigingResult?

I use the LASSO at ligne 49 : functionBaseSelected = list(functionBase[np.nonzero(LassoCoef)]) . I deleted elements of my full basis.

But functionBaseSelected is not used anywhere in the script BestMetamodel.py, is it?

But I’m wondering why Q2 here should take into account var(y) instead of conditional variance?

How would you define the conditional variance in the Q2 formula? What would it be conditioned on?

You are right I did a mistake in BestMetamodel.py, the basis should be defined as :
basis = ot.Basis(functionBaseSelected )
So I propose to write two python files to close this topic with solutions that we discussed :

  1. The comparison between OT and DiceKriging that is validated.
  2. The best kriging obtained with Q2 “detrend” computation, lasso regression.

I will try to maximize the number of comments.

But I’m wondering why Q2 here should take into account var(y) instead of conditional variance?

Mistake. In fact I was confusing between Q_2 and standardized residuals. We might add an analysis of such residuals for kriging diagnostic.
And I agree that the python tools should be one day in the API