Hi,
It would have been nice to add your script, even if it is basic
To make things short, it is an example of overfitting, and the change between the two versions of OT may be due to the default values of some algorithms (impossible to check as you don’t mention the two versions of OT you used).
In order to analyze the problem I used the following script:
import openturns as ot
import openturns.viewer as otv
import openturns.experimental as otexp
# Load the training data
inTrain = ot.Sample.ImportFromTextFile("input_train.csv", ",")
outTrain = ot.Sample.ImportFromTextFile("output_train.csv", ",")
# Load the validation data
inVal = ot.Sample.ImportFromTextFile("input_val.csv", ",")
outVal = ot.Sample.ImportFromTextFile("output_val.csv", ",")
# Recover the input distribution. It should be known, so I use all
# the available information with no shame ;-)
indata=list(inTrain)+list(inVal)
distribution = ot.MetaModelAlgorithm.BuildDistribution(indata)
# Now the polynomial basis
enumFun = ot.LinearEnumerateFunction(distribution.getDimension())
basis = ot.OrthogonalProductPolynomialFactory([ot.LegendreFactory() for i in range(distribution.getDimension())], enumFun)
# Here is the cause of the burden: results look nice with degree 2,
# and awful with degree=3
for deg in [2, 3]:
print("#"*50)
print(f"{deg=}")
basisSize = enumFun.getBasisSizeFromTotalDegree(deg)
# Use the new, simplified algo
algo = otexp.LeastSquaresExpansion(inTrain, outTrain, distribution, basis, basisSize, "SVD")
algo.run()
# Check the residuals
print("residuals=", algo.getResult().getResiduals())
meta = algo.getResult().getMetaModel()
valid = ot.MetaModelValidation(inVal, outVal, meta)
graph = valid.drawValidation()
q2 = valid.computePredictivityFactor()
q2_0 = q2[0]
q2_1 = q2[1]
graph.setTitle(f"{deg=}, {q2_0=}, {q2_1=}")
view = otv.View(graph)
view.save("Result_deg_" + str(deg) + ".png")
view.close()
and the output is:
##################################################
deg=2
residuals= [0.00783466,0.899184]
##################################################
deg=3
residuals= [2.13319e-16,2.27965e-14]
With a residual essentially equal to zero for deg=3, you may be sure that you have overfitting somewhere.
The validation graphs are:
and you see the effect of overfitting.
You get the exact same results using the legacy PCE algorithm:
import openturns as ot
import openturns.viewer as otv
import openturns.experimental as otexp
# Load the training data
inTrain = ot.Sample.ImportFromTextFile("input_train.csv", ",")
outTrain = ot.Sample.ImportFromTextFile("output_train.csv", ",")
# Load the validation data
inVal = ot.Sample.ImportFromTextFile("input_val.csv", ",")
outVal = ot.Sample.ImportFromTextFile("output_val.csv", ",")
# Now the polynomial basis
enumFun = ot.LinearEnumerateFunction(inTrain.getDimension())
# Here is the cause of the burden: results look nice with degree 2,
# and awful with degree=3
for deg in [2, 3]:
print("#"*50)
print(f"{deg=}")
basisSize = enumFun.getBasisSizeFromTotalDegree(deg)
# Adapt the basis size
ot.ResourceMap.SetAsUnsignedInteger("FunctionalChaosAlgorithm-BasisSize", basisSize);
algo = ot.FunctionalChaosAlgorithm(inTrain, outTrain)
algo.run()
# Check the residuals
print("residuals=", algo.getResult().getResiduals())
meta = algo.getResult().getMetaModel()
valid = ot.MetaModelValidation(inVal, outVal, meta)
graph = valid.drawValidation()
q2 = valid.computePredictivityFactor()
q2_0 = q2[0]
q2_1 = q2[1]
graph.setTitle(f"{deg=}, {q2_0=:.3f}, {q2_1=:.3f}")
view = otv.View(graph)
view.save("Result_old_deg_" + str(deg) + ".png")
view.close()
Please send us your script and the versions of OT you used if you want more insight on what append between these two versions.
Cheers
Régis