Dirichlet distribution

I have some questions regarding Dirichlet (or multivariate) distributions.
In my case I want a distribution where the constraint is, that the sum of all values is equal to 1, so I created a Dirichlet-Distribution as follows:

fuel = ot.Dirichlet([1,1,1,1])
fuel.setDescription(['n-alkanes', 'iso-alkanes', 'cyclo-alkanes'])

Now, I want to do the following two things with it:

Truncate the 1st marginal with a lower bound of 0.5

If I try this with

Y1 = ot.TruncatedDistribution(fuel.getMarginal(0),0.5)

I have to recombine the resulting Distribution with the other 2 marginals:

composed = ot.ComposedDistribution([Y1,fuel.getMarginal(2),fuel.getMarginal(2)])

but then the constraint of sum=1 is not fulfilled anymore. Is there any way to do this or is it just not possible?

Combine the dirichlet distribution with a second distribution (in this case a uniform distribution)

If I want to add another uncertain variable to the dirichlet-distribution to do a sampling, I found in the examples to use the ot.ComposedDistribution, but this does only work with univariate distributions:

mass_fl = ot.Uniform(0.0, 1.0)
mass_fl.setDescription(['mass_fl'])
combined= ot.ComposedDistribution([fuel, mass_fl])

gives
TypeError: InvalidArgumentException : The marginal distribution 0 is of dimension 3, which is different from 1.
and when I try to do it with the marginals of the Dirichlet-Distribution

ot.ComposedDistribution([fuel.getMarginal(0), fuel.getMarginal(1), fuel.getMarginal(2),  mass_fl])

then again the constraint for sum(fuel)=1 is not fullfilled anymore.

Am I missing a basic point here or is this also not possible or not implemented?

Hi,
First of all, it is not possible to model the constraint x_1+…+x_d=1 using the Dirichlet class in OT, as stated in the documentation (https://openturns.github.io/openturns/latest/user_manual/_generated/openturns.Dirichlet.html. It is because:

  • the last component can be deduced from the others using the constraint
  • this way, the distribution is absolutely continuous wrt the Lebesgue measure, meaning that it has a well-defined PDF function

The good news is that it is easy to embed the Dirichlet class into the PythonDistribution class to get the behavior you want… but the CDF is complex to compute due to the lack of a proper PDF, preventing the basic use of numerical integration. I used Monte Carlo integration instead:

import openturns as ot

class SingularDirichlet(ot.PythonDistribution):

    def __init__(self, theta=[1.0, 1.0], Ncdf=10000):
        super(SingularDirichlet, self).__init__(len(theta))
        self.dirichlet = ot.Dirichlet(theta)
        self.theta = theta
        self.cdfSample = self.getSample(Ncdf)

    def getRange(self):
        return ot.Interval(len(self.theta))

    def getRealization(self):
        point = self.dirichlet.getRealization()
        point.add(1.0 - sum(point))
        return point

    def getSample(self, size):
        sample = self.dirichlet.getSample(size)
        last_row = ot.Sample([[1-sum(x)] for x in sample])
        sample.stack(last_row)
        sample.setDescription(ot.Distribution(self).getDescription())
        return sample

    def computeCDF(self, X):
        # Here we use crude Monte Carlo, waiting for a better implementatio
        return self.cdfSample.computeEmpiricalCDF(X)

    def computePDF(self, X):
        if sum(X) != 1.0:
            return 0.0
        X = ot.Point(X)
        return self.dirichlet.computePDF(X[0:-2])

    def getMarginal(self, indices):
        subTheta = []
        for i in indices:
            subTheta.append(self.theta[i])
        py_dist = SingularDirichlet(subTheta)
        return ot.Distribution(py_dist)

    def getParameter(self):
        return self.dirichlet.getParameter()

    def getParameterDescription(self):
        return self.dirichlet.getParameterDescription()

    def setParameter(self, parameter):
        self.dirichlet.setParameter(parameter)

    def isContinuous(self):
        return False

    def isDiscrete(self):
        return False

The two last methods are mandatory to get a correct implementation of the computeProbability() method.

Concerning the truncation, it cannot work the way you do it because you break the dependence structure when you extract the margins then truncate one of them then reassemble everything using ComposedDistribution. What you must do is to use the multivariate constructor of the TruncateDistribution class:

Nmc = 10000
fuel_unbounded = ot.Distribution(SingularDirichlet([1, 1, 1], Nmc))
bounds = ot.Interval([0.5, 0.0, 0.0], [1.0, 1.0, 1.0])
fuel = ot.TruncatedDistribution(fuel_unbounded, bounds)
fuel.setDescription(['n-alkanes', 'iso-alkanes', 'cyclo-alkanes'])
print(fuel.getSample(5))

gives you:

    [ n-alkanes     iso-alkanes   cyclo-alkanes ]
0 : [ 0.790229      0.201768      0.00800307    ]
1 : [ 0.741614      0.137147      0.121239      ]
2 : [ 0.506817      0.234319      0.258864      ]
3 : [ 0.796385      0.17514       0.0284751     ]
4 : [ 0.507067      0.136147      0.356786      ]

Concerning the combination of the fuel distribution and the mass distribution, the class you have to use is the BlockIndependentDistribution class:

mass_fl = ot.Uniform(0.0, 1.0)
mass_fl.setDescription(['mass_fl'])
combined= ot.BlockIndependentDistribution([fuel, mass_fl])
print(combined.getSample(5))

which gives you:

    [ n-alkanes     iso-alkanes   cyclo-alkanes mass_fl       ]
0 : [ 0.888138      0.0517162     0.060146      0.548626      ]
1 : [ 0.555712      0.141518      0.30277       0.419056      ]
2 : [ 0.772359      0.153878      0.0737633     0.747742      ]
3 : [ 0.564452      0.204366      0.231182      0.0667618     ]
4 : [ 0.607373      0.190895      0.201732      0.821981      ]

The SingularDirichlet class is fully correct if you use it to perform Monte Carlo simulations. If you plan to use more advanced methods such as QMC, FORM/SORM or any other method relying on an accurate evaluation of its CDF you will get an approximate result, equivalent to;

fuel_approximate = ot.UserDefined(fuel.getSample(Nmc))

I have ideas to implement an accurate evaluation of the CDF but it needs (much) more work.

Cheers

Régis

1 Like

Wow, Thank you very much for the long and detailed response!

The multivariate constructor of the TruncateDistribution and the BlockIndependentDistribution were the two functions that I needed!

Also, the SingularDirichlet helps a lot!