Multivariate distribution with combined relationship

Hi,

I am new in OpenTURNS and i am trying to create a multivariate distribution to generate a sample of 4 variables : A, B, C and D.
The 2 first (A & B) are independent, but the 2 last one are linked between each other and depend on the 2 first variables :

The relationship are defined as :

  • A : Independent uniform variable between 300 & 6000
  • B : Independent uniform variable between 1.6 & 4
  • C : Uniform distribution between a minimal value depending on A & B and 12490
  • D : Uniform distribution between C+10 and 12500

The minimal value is respecting the following equation:

\min \left( (305+x_0) \tan\left(\frac{\pi x_1}{180} \right) a + 152, \; b\right)

where a=3.28084 and b=12490.

I have tried to use BayesDistribution, but i did’nt figure out how to define the relationship between C & D:

ADistribution = ot.Uniform(300, 6000)
BDistribution = ot.Uniform(1.6, 4)
ComposedDistribution = ot.ComposedDistribution([ADistribution, BDistribution])

fFunction  = ot.SymbolicFunction(["x0", "x1"], ["y0", "y1", "y2", "y3"], f'y0 := min(((305+x0)*tan(x1*pi_/180)*3.28084) + 152,12500); y1 := 12490; y2 := y0+10; y3 := 12500')
distribution = ot.BayesDistribution(ot.ComposedDistribution([ot.Uniform(0, 12490), ot.Uniform(0, 12500)]), 
                                    ot.ComposedDistribution([ADistribution, BDistribution]), fFunction)

sample = distribution.getSample(1000).asDataFrame()

Maybe i am not using it correctly or there is easier way to do it ?

I need the distribution and not only the samples as i am using it in a MonteCarlo DoE

Thanks in advance
Regards
Adil

Hi,

The key is to use the BayesDistribution class twice, one for the definition of the (A,B,C) distribution, and one to complete it into the distribution of (A,B,C,D). Unfortunately, BayesDistribution will give you the distribution of (C,A,B) given the distribution of C|f(A,B) and the distribution of (A,B), and then the distribution of (D, C, A, B) given the distribution of D|g(C,A,B) and the distribution of (C,A,B). You get the distribution of (A,B,C,D) by extracting the relevant marginal distribution of (D,C,A,B). It gives:

import openturns as ot

A = ot.Uniform(300.0, 6000.0)
A.setDescription(["A"])
B = ot.Uniform(1.6, 4.0)
B.setDescription(["B"])
AB = ot.ComposedDistribution([A, B])
print("AB=", AB.getDescription())
linkAB_C = ot.SymbolicFunction(["x0", "x1"], ["min(((305+x0)*tan(x1*pi_/180)*3.28084) + 152,12490)", "12490"])
CAB = ot.BayesDistribution(ot.Uniform(), AB, linkAB_C)
CAB.setDescription(["C", "A", "B"])
print("CAB=", CAB.getDescription())
linkC_D = ot.SymbolicFunction(["x0", "x1", "x2"], ["x0+10", "12500"])  
DCAB = ot.BayesDistribution(ot.Uniform(), CAB, linkC_D) 
DCAB.setDescription(["D", "C", "A", "B"])
ABCD = DCAB.getMarginal([2, 3, 1, 0])
sample = ABCD.getSample(1000)
# Check the multivariate sample
ot.Show(ot.VisualTest.DrawPairs(sample))
# Check the normalized ranks, aka the copula
ot.Show(ot.VisualTest.DrawPairs(sample.rank() / sample.getSize()))

which produces the following figures:


and

Hi Regis,

Thanks a lot for the answer and trick. It helps a lot :wink: )

Hello,

I also had to use the BayesDistribution recently and I used the trick to reorder the marginals mentioned by Regis. However I noticed a large difference of the time to compute the PDF. Here is a small example, coming from the documentation in dimension 2, it is instantaneous for the original bayes distribution but it takes 1.35s for the distribution with reordered marginals.

In my real case I am in dimension 6, it is also fast for the original bayes distribution. But with a reordered distribution, I had to stop the process because the value is never returned (or I did not wait enough).

Do you have an idea where it comes from ?

import openturns as ot
conditioningDist = ot.Normal(0.0, 1.0)
g = ot.SymbolicFunction(['y'], ['y', '0.1+y^2'])
conditionedDist = ot.Normal()
finalDist = ot.BayesDistribution(conditionedDist, conditioningDist, g)
modifyDist = finalDist.getMarginal([1, 0])
%timeit finalDist.computePDF([0.5, 0.5])
# 3.86 μs en moyenne
%timeit modifyDist.computePDF([0.5, 0.5])
# 1.35 s en moyenne

Regards,
Antoine

Sorry for the late answer, I had some troubles these last weeks.
The problem you mention is already known from the developers, and a specific action is planed on this subject next Tuesday! It is due to the generic implementation of the marginal distributions using the MarginalDistribution class. This class makes the assumption that the CDF is fast to compute, and all the computations are done using CDF computations.

In your case:

  • The PDF of the MarginalDistribution is computed using centered finite differences wrt the CDF. In dimension 6, it involves 12 CDF evaluations
  • But one CDF evaluation is a 6-d integration of the BayesDistribution PDF, which takes ages as the complexity of the algorithm is exponential wrt the dimension and one evaluation of the BayesDistribution PDF is already costly.

What is planed is:

  • To improve the performance of the MarginalDistribution class, in particular the user will be able to select the algorithms used for each service (do I use CDF or PDF of the underlying distribution to compute the CDF or PDF of the marginal distribution?). In your case, if you select a PDF-based approach, the computePDF() method of the marginal will be as fast as the original computePDF() as it is only a permutation of the indices
  • To better define what are the objectives of the BayesDistribution, ConditionalDistribution and PosteriorDistribution classes, and if their names are correct wrt these objectives
  • To test these classes in more details, as they have been here for a long time with no actual use on large scale applications.

A possible workaround is to use a Python reimplementation of the class. Michael Baudin has proposed such a reimplementation here otbenchmark/otbenchmark/ConditionalDistribution.py at master · mbaudin47/otbenchmark · GitHub

Cheers

Régis

Hi Régis,

Thank you for your answer, I will wait for your updates.
In my case I was able to implement a simple solution, I used to the original bayes distribution, but it is my evaluation function that I modified: I built a composed function, with first a symbolic function to reorder the input as it was originally.

Cheers,
Antoine

Hi Antoine,

I finally took the time to improve the MarginalDistribution class. Now, with the following script:

import openturns as ot
import openturns.viewer as otv
from time import time

ot.Log.Show(ot.Log.ALL)
conditioningDist = ot.Normal(0.0, 1.0)
g = ot.SymbolicFunction(['y'], ['y', '0.1+y^2'])
conditionedDist = ot.Normal()
finalDist = ot.BayesDistribution(conditionedDist, conditioningDist, g)
modifyDist = finalDist.getMarginal([1, 0])

grid = ot.GridLayout(2, 3)

t0 = time()
g = finalDist.drawPDF([-1.0]*2, [1.0]*2)
g.setTitle("finalDist, t=%.3e" % (time() - t0) + "s")
grid.setGraph(0, 0, g)

t0 = time()
marginal = finalDist.getMarginal(0)
g = marginal.drawPDF(-1.0, 1.0)
g.setTitle("finalDist_0, t=%.3e" % (time() - t0) + "s")
grid.setGraph(0, 1, g)

t0 = time()
marginal = finalDist.getMarginal(1)
g = marginal.drawPDF(-1.0, 1.0)
g.setTitle("finalDist_1, t=%.3e" % (time() - t0) + "s")
grid.setGraph(0, 2, g)

t0 = time()
g = modifyDist.drawPDF([-1.0]*2, [1.0]*2)
g.setTitle("modifyDist, t=%.3e" % (time() - t0) + "s")
grid.setGraph(1, 0, g)

t0 = time()
marginal = modifyDist.getMarginal(0)
g = marginal.drawPDF(-1.0, 1.0)
g.setTitle("modifyDist_0, t=%.3e" % (time() - t0) + "s")
grid.setGraph(1, 1, g)

print(modifyDist)
t0 = time()
marginal = modifyDist.getMarginal(1)
g = marginal.drawPDF(-1.0, 1.0)
g.setTitle("modifyDist_1, t=%.3e" % (time() - t0) + "s")
grid.setGraph(1, 2, g)

view = otv.View(grid)
view.save("master.png")

I get with the current master:

On the first row you see the 2D PDF of finalDist, followed by the PDF of its two marginal distributions. On the second row, you have the PDF of the permuted distribution as well as its marginals, which should be the same as the other distribution but swapped. You see 2 things:

  • There is a lot of numerical noise on the 2D PDF, and a visible glitch on the first marginal of the permuted distribution
  • The time taken to draw all these graphs is indicated in the title, and the 2D pdf took nearly 3h to be computed on my quite powerful laptop

With my current branch I get:

and you see that the numerical noise has disapear and the computation time has been drastically reduced (yes, it is 0.04s instead of 8795s for the 2D PDF, and 0.2s for each marginal PDF instead of 0.042s and 3.47e-4s). The 1D marginal distributions are “true” marginal distributions while the 2D PDF is just a permutation of the initial distribution (and is now correctly detected and taken into account). It is why the 2D PDF is so fast to compute while there is an 1D integration for each evaluation of the PDF of the 1D marginal distributions.

I hope this code will be merged for the upcoming release of OT.

Cheers

Régis

Hi Régis,

Nice evolution ! We will be able to use it soon then.

Thanks.