Use of Log Normal Distribution - metamodel building

Hello,

I’m building some metamodel using polynomial chaos to study seal leakage.

The metamodel must study both the nominal behaviour and anormal behaviour.
In this case, one of the variable should vary between 1e-6 and 1e-4 with the nominal values at 1e-5.
It is hard to catch good estimation with the metamodel for both nominal and anormal values using uniform or normal distribution for the variable.

There are my questions :

  • What is the distribution that best fit the problem (several order of magnitude for one variable) ?
  • How to compute efficiently the parameters of a chosen distribution, knowing the “practical” values ?
    Here for example : nominal value (1e-5) must fit the maximum of the probability density function, [1e-6; 1e-4] are the bounds of the study with LogNormal distribution

Thank you for your answer

Hi!
Welcome on this forum!

Every time I hear about the use of the Log-Normal distribution with polynomial chaos expansion, I think about [1], pointed to me by @regislebrun. The paper shows that not all orthogonal bases are complete. Actually, the vector space generated by the log-normal distribution is not complete, meaning that the PCE using the orthogonal polynomials built on the log-normal may not converge to the true function. It does not mean that it will not, because it depends on the actual function, but there is possibility with that particular basis. In practice, many log-normal distributions are truncated, which greatly simplifies the problem: the basis is now complete (see [2] page 139). It is easy to create a truncated Log-Normal distribution, using the TruncatedDistribution class.

In order to try to fix your problem, I think it might help to clarify the setting.

  • Do you work on simulated data i.e. where the input X has a known distribution and the physical model g is known?
  • Or do you work on a given data set, e.g. a CSV text file?
  • In both cases, what is the sample size, what is the dimension? What is the type of input: vector or field?

This changes everything, because in the first case, we have to infer the distribution, while in the second case the distribution is known.

In the meantime, please have a look at the FunctionalChaosAlgorithm.BuildDistribution static method. This method takes a sample as input and returns the multivariate distribution that best fits. It uses the marginal distribution which has highest p-value of the Kolmogorov-Smirnov test. If there is a dependence, then a normal copula is used.

Regards,

Michaël


  1. Ernst, O. G., Mugler, A., Starkloff, H. J., & Ullmann, E. (2012). On the convergence of generalized polynomial chaos expansions. ESAIM: Mathematical Modelling and Numerical Analysis, 46(2), 317-339. ↩︎

  2. Sullivan, T. J. (2015). Introduction to uncertainty quantification (Vol. 63). Springer. ↩︎

Hello !

Thank you for your answer.

Using log-normal was a proposal but it can be discussed.

To answer your questions :

  • I use simulated data. At time, I have 7 variables all defined with a uniform probability law.
    The physical model is a code that is run for each experiment.

  • Design of experiments uses LHS, with 500 experiments, 30% of them used for validation.
    At time the Q2 indicator is about 0.99 for both analytical and test samples values.

My question is not necessary on the representativity of the metamodel vs the simulation model but rather on the definition of the DOE that best catches both nominal and abnormal behaviour.

I will look at the document linked in your post.

Best regards,
Simon

Hello!
I think that I now understand what you are searching for.
I came up with the following code:

import openturns as ot
import openturns.viewer as otv
import math


# Distribution parameters
xMin = 1.e-6
xMax = 1.e-4

# Plot parameters
xDrawMin = 1.e-7
xDrawMax = 1.e-3
nPoints = 10000

# LogNormal
muLog = math.log(1.e-5)
print("muLog = ", muLog)
sigmaLog = 1.0  # How to choose this parameter?
logNormalDistribution = ot.LogNormal(muLog, sigmaLog)
graph = logNormalDistribution.drawPDF(xDrawMin, xDrawMax, nPoints)
graph.setLogScale(ot.GraphImplementation.LOGX)
view = otv.View(graph)

# Truncated LogNormal
distribution = ot.TruncatedDistribution(logNormalDistribution, xMin, xMax)
graph = distribution.drawPDF(xDrawMin, xDrawMax, nPoints)
graph.setLogScale(ot.GraphImplementation.LOGX)
view = otv.View(graph)

print("Distribution mean = ", distribution.getMean()[0])

This produces:

image

Figure 1. The LogNormal distribution.

image

Figure 2. The truncated LogNormal distribution.

Moreover, it prints:

Distribution mean =  1.5214744455169813e-05

This shows that, when we truncate the distribution, it changes the mean.

My conclusion is that you do not provide enough parameters to completely specify the distribution. This is because a truncated log-normal distribution has 4 parameters:

  • the logarithm of the mean of the untruncated log-normal;
  • the logarithm of the standard deviation untruncated log-normal;
  • the minimum;
  • the maximum.

You only specified the following parameters:

  • the mean of the truncated log-normal : 10^{-5} ;
  • the minimum of the truncated log-normal : 10^{-6} ;
  • the maximum of the truncated log-normal : 10^{-4}.

This is not enough. Do you have more information on the distribution you are looking for?

If you do, you can proceed by searching for the parameters that fits to the requirements, as the method of moments would do. This is easy with OpenTURNS but you need to use an optimization algorithm to invert the equations (because there is currently no algorithm to solve a non linear system of equations).

Regards,

Michaël