Hi,
I am very surprised by the fact that you cannot use the full 100k sample with PCE in a reasonable amount of time. I suspect that you used the simplified interface, i.e. FunctionalChaosAlgorithm(inputSample, outputSample), and the algorithm is probably spending a huge amount of time trying to build the input distribution (which is done when the algorithm is created), not so much during the computation of the approximation.
You can check this by increasing the verbosity: add Log.Show(Log.ALL)
just before the creation of the algorithm, and it should pause on one of the most expansive distribution factories.
A better way to use this algorithm is to go deeper into its elements. If you already know the input distribution there is no need to estimate it. If you know the smoothness of your function there is no need to go to high degrees. Here is a full script allowing to play with the different elements:
from openturns import *
# Log.Show(Log.ALL)
# Generate some data. In your case, you just have to load it from file
# in_sample = Sample.ImportFromCSVFile("in_sample.csv")
# in_dim = in_sample.getDimension()
# out_sample = Sample.ImportFromCSVFile("out_sample.csv")
N = 100000
in_dim = 5
d_ref = Normal(in_dim)
in_sample = d_ref.getSample(N)
f = SymbolicFunction(["x1","x2","x3","x4","x5"], ["x1*sin(x2+x3)+sin(x4+sin(x5))"])
out_sample = f(in_sample)
####################################
# PCE using the detailed interface #
####################################
# Let's assume we don't know d_ref. Otherwise, use the actual input distribution
marginals = [HistogramFactory().build(in_sample.getMarginal(i)) for i in range(in_dim)]
d = ComposedDistribution(marginals)
# Build the polynomial basis
polynomials = [StandardDistributionPolynomialFactory(m) for m in marginals]
basis = OrthogonalProductPolynomialFactory(polynomials)
# Build the projection strategy
projection = LeastSquaresStrategy(in_sample, out_sample)
# Build the adaptive strategy
max_degree = 6
total_size = basis.getEnumerateFunction().getStrataCumulatedCardinal(max_degree)
adaptive = FixedStrategy(basis, total_size)
# Create the PCE algo
from time import time
t0 = time()
algo = FunctionalChaosAlgorithm(in_sample, out_sample, d, adaptive, projection)
# If you want to get the exact same results using the simplified interface replace all the preceding lines by
# ResourceMap.SetAsUnsignedInteger("FunctionalChaosAlgorithm-MaximumTotalDegree", 6)
# ResourceMap.SetAsScalar("FunctionalChaosAlgorithm-QNorm", 1.0)
# algo = FunctionalChaosAlgorithm(in_sample, out_sample)
t1 = time()
print("Creation t=", t1 - t0, "s")
t0 = time()
algo.run()
t1 = time()
result = algo.getResult()
print("Run t=", t1 - t0, "s")
print(FunctionalChaosSobolIndices(result).summary())
##############
# Validation #
##############
in_sample_valid = d_ref.getSample(N)
out_sample_valid = f(in_sample_valid)
validation = MetaModelValidation(in_sample_valid, out_sample_valid, result.getMetaModel())
print("Q2=", validation.computePredictivityFactor())
Using the simplified interface I get the following results:
Creation t= 146.34749007225037 s
Run t= 0.2926785945892334 s
input dimension: 5
output dimension: 1
basis size: 101
mean: [-0.00257423]
std-dev: [0.878122]
------------------------------------------------------------
Index | Multi-indice | Part of variance
------------------------------------------------------------
4 | [0,0,0,1,0] | 0.302016
17 | [1,0,1,0,0] | 0.176462
16 | [1,1,0,0,0] | 0.170647
5 | [0,0,0,0,1] | 0.135441
51 | [0,0,0,2,1] | 0.0700867
14 | [0,0,0,3,0] | 0.0496075
70 | [1,3,0,0,0] | 0.0331452
74 | [1,0,3,0,0] | 0.0287432
15 | [0,0,0,0,3] | 0.0235836
------------------------------------------------------------
------------------------------------------------------------
Component | Sobol index | Sobol total index
------------------------------------------------------------
0 | 0.000308946 | 0.409668
1 | 0.000163768 | 0.204437
2 | 0.000341013 | 0.205788
3 | 0.354638 | 0.428833
4 | 0.160817 | 0.235004
------------------------------------------------------------
Q2= [0.792232]
and using the detailed interface:
Creation t= 0.13089418411254883 s
Run t= 0.9613080024719238 s
input dimension: 5
output dimension: 1
basis size: 462
mean: [-0.00169308]
std-dev: [0.984487]
------------------------------------------------------------
Index | Multi-indice | Part of variance
------------------------------------------------------------
4 | [0,0,0,1,0] | 0.241599
7 | [1,1,0,0,0] | 0.141194
8 | [1,0,1,0,0] | 0.140077
5 | [0,0,0,0,1] | 0.107964
75 | [1,1,2,0,0] | 0.0709469
72 | [1,2,1,0,0] | 0.0691765
53 | [0,0,0,2,1] | 0.0526255
52 | [0,0,0,3,0] | 0.0403512
81 | [1,0,3,0,0] | 0.0238355
71 | [1,3,0,0,0] | 0.0218094
55 | [0,0,0,0,3] | 0.0184457
326 | [1,3,2,0,0] | 0.0127656
332 | [1,2,3,0,0] | 0.0125651
249 | [0,0,0,2,3] | 0.0115486
------------------------------------------------------------
------------------------------------------------------------
Component | Sobol index | Sobol total index
------------------------------------------------------------
0 | 4.75502e-06 | 0.508491
1 | 8.76558e-06 | 0.343322
2 | 1.65735e-05 | 0.344294
3 | 0.284018 | 0.363659
4 | 0.127906 | 0.207842
------------------------------------------------------------
Q2= [0.990007]
You can see the significant difference in terms of CPU time (147s vs 1s, on a quite powerful workstation) and the improved quality on this specific example (but you can get the exact same results by tweaking the behavior of PCE using ResourceMap, see the script).
Cheers
Régis