Hello
My team did some training challenge on the data attached to this post. The goal is to learn the best Kriging model. We noticed a difference between the result provided by OpenTurns and by DiceKriging (R). When covariances parameters are fixed for both software and the same trend formula is used, we don’t obtain the same trend coefficients. As a result, the Q2 LOO measure obtained with OpenTurns is worse than the one with DiceKriging.
I can’t set the trend coefficient in OpenTurns so I can’t go further in my analysis, is it possible to have some help on that issue ?
SCRIPT PYTHON
"""
Exemple pour comparaison OPENTURNS et DICEKRIGING
durantin | 24/11/2020 | V0
"""
import numpy as np
import openturns as ot
import matplotlib.pyplot as plt
from sklearn import preprocessing
#%% ##################################################
#### Lecture des données et stockage
######################################################
X = np.loadtxt('X.txt')
Y = np.loadtxt('Y.txt')
#%% #########################################################
#### Apprentissage du Processus Gaussien conditionné
#############################################################
DimX = 6
#### Définition de la tendance degrée 2 avec seulement les intéractions:
inputName = []
for i in range(DimX) :
inputName.append('X'+str(i+1))
functionBase = []
functionBase.append(ot.SymbolicFunction(inputName,['1']))
for i in range(DimX):
functionBase.append(ot.SymbolicFunction(inputName,[inputName[i]]))
for i in range(DimX):
for j in range(i,DimX):
if (i!=j) :
functionBase.append(ot.SymbolicFunction(inputName,[inputName[i]+'*'+inputName[j]]))
basis = ot.Basis(functionBase)
#### Définition de la covariance
covarianceModel = ot.MaternModel([0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099],[np.sqrt(6.589306)], 5./2)
#### Apprentissage
algo = ot.KrigingAlgorithm( ot.Sample(X), ot.Sample(np.atleast_2d(Y).T), covarianceModel, basis, False )
algo.setOptimizeParameters(False)
algo.run()
result = algo.getResult()
print(result.getTrendCoefficients())
''
6.47196,-13.4897,13.7795,-19.0711,18.2367,19.2923,
-13.948,9.37527,8.03541,-9.24838,10.8714,5.55679,
3.42253,-17.9401,-11.1812,-1.26304,0.826423,17.2884,
5.91668,-23.7892,19.2513,-16.5288
''
#### Calcul du Q2 par LOO
Covmat = np.array(result.getCovarianceModel().discretize(X))
Fmat = preprocessing.PolynomialFeatures(degree = 2, interaction_only = True, include_bias = True).fit_transform(X)
dim0 = Fmat.shape[1]
Smat = np.vstack( (np.hstack((Covmat,Fmat)), np.hstack((Fmat.T,np.zeros([dim0,dim0]))) ) )
SmatInv = np.linalg.inv(Smat)
SmatInvLimited = SmatInv[range(X.shape[0])][:,range(X.shape[0])]
SmatInvDiag = np.diag(SmatInvLimited)
SmatInv0 = SmatInvLimited - np.diag(SmatInvDiag)
muLOO = -np.divide(np.dot(SmatInv0,Y),SmatInvDiag)
Q2 = 1 - (np.mean((muLOO-Y)**2)/np.var(Y))
fig, ax = plt.subplots()
ax.scatter( Y, muLOO )
ax.plot( [Y.min(), Y.max()], [Y.min(), Y.max()], 'k--', lw=4)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title('Q2 = '+str(Q2))
```
**SCRIPT R**
> require(DiceKriging)
>
> X = read.table("X.txt");
> X=cbind(X$V1,X$V2,X$V3,X$V4,X$V5,X$V6)
> Y1 = read.table("Y.txt")$V1
>
>
> N = dim(X)[1]
> var1 = 6.589306
> theta1 = c( 0.05109184, 0.31894724, 0.10191949, 0.16632671, 0.05568251, 0.11347099)
>
> model1 = km(~.^2,design = X,response = Y1,nugget.estim = F,
> coef.var = var1, coef.cov = theta1)
>
> errLOO = matrix(0,N,1)
> for(i in 1:N){
> modeltemp_LOO1<-km(~.^2,design=X[-i,],response=data.frame(z=Y1[-i]),
> coef.var = var1, coef.trend = model1@trend.coef, coef.cov = theta1, nugget.estim = F)
> errLOO[i,]=predict(modeltemp_LOO1,newdata=matrix(X[i,],nrow=1),type="UK",checkNames=F)$mean
> }
>
> Q2=1-mean((Y1-errLOO)^2)/var(Y1)
>
> print(Q2)
>
> print(model1@covariance@range.val)
> print(model1@covariance@sd2)
> print(model1@trend.coef)
> ##### [1] 6.2872398 -13.6359917 14.0025015 -19.0124666 18.1585121 19.6508349
> ##### [7] -13.8427035 9.6053325 8.0803512 -9.0787169 10.8451498 5.4669461
> ##### [13] 3.3470293 -18.0829504 -11.5673840 -1.2309130 0.8232995 17.3156215
> ##### [19] 5.7652294 -23.9570578 19.4113520 -16.5715691
>
> Covmat = t(model1@T) %*% model1@T