Hi,
First of all, it is not possible to model the constraint x_1+…+x_d=1 using the Dirichlet class in OT, as stated in the documentation (https://openturns.github.io/openturns/latest/user_manual/_generated/openturns.Dirichlet.html. It is because:
- the last component can be deduced from the others using the constraint
- this way, the distribution is absolutely continuous wrt the Lebesgue measure, meaning that it has a well-defined PDF function
The good news is that it is easy to embed the Dirichlet class into the PythonDistribution class to get the behavior you want… but the CDF is complex to compute due to the lack of a proper PDF, preventing the basic use of numerical integration. I used Monte Carlo integration instead:
import openturns as ot
class SingularDirichlet(ot.PythonDistribution):
def __init__(self, theta=[1.0, 1.0], Ncdf=10000):
super(SingularDirichlet, self).__init__(len(theta))
self.dirichlet = ot.Dirichlet(theta)
self.theta = theta
self.cdfSample = self.getSample(Ncdf)
def getRange(self):
return ot.Interval(len(self.theta))
def getRealization(self):
point = self.dirichlet.getRealization()
point.add(1.0 - sum(point))
return point
def getSample(self, size):
sample = self.dirichlet.getSample(size)
last_row = ot.Sample([[1-sum(x)] for x in sample])
sample.stack(last_row)
sample.setDescription(ot.Distribution(self).getDescription())
return sample
def computeCDF(self, X):
# Here we use crude Monte Carlo, waiting for a better implementatio
return self.cdfSample.computeEmpiricalCDF(X)
def computePDF(self, X):
if sum(X) != 1.0:
return 0.0
X = ot.Point(X)
return self.dirichlet.computePDF(X[0:-2])
def getMarginal(self, indices):
subTheta = []
for i in indices:
subTheta.append(self.theta[i])
py_dist = SingularDirichlet(subTheta)
return ot.Distribution(py_dist)
def getParameter(self):
return self.dirichlet.getParameter()
def getParameterDescription(self):
return self.dirichlet.getParameterDescription()
def setParameter(self, parameter):
self.dirichlet.setParameter(parameter)
def isContinuous(self):
return False
def isDiscrete(self):
return False
The two last methods are mandatory to get a correct implementation of the computeProbability() method.
Concerning the truncation, it cannot work the way you do it because you break the dependence structure when you extract the margins then truncate one of them then reassemble everything using ComposedDistribution. What you must do is to use the multivariate constructor of the TruncateDistribution class:
Nmc = 10000
fuel_unbounded = ot.Distribution(SingularDirichlet([1, 1, 1], Nmc))
bounds = ot.Interval([0.5, 0.0, 0.0], [1.0, 1.0, 1.0])
fuel = ot.TruncatedDistribution(fuel_unbounded, bounds)
fuel.setDescription(['n-alkanes', 'iso-alkanes', 'cyclo-alkanes'])
print(fuel.getSample(5))
gives you:
[ n-alkanes iso-alkanes cyclo-alkanes ]
0 : [ 0.790229 0.201768 0.00800307 ]
1 : [ 0.741614 0.137147 0.121239 ]
2 : [ 0.506817 0.234319 0.258864 ]
3 : [ 0.796385 0.17514 0.0284751 ]
4 : [ 0.507067 0.136147 0.356786 ]
Concerning the combination of the fuel distribution and the mass distribution, the class you have to use is the BlockIndependentDistribution class:
mass_fl = ot.Uniform(0.0, 1.0)
mass_fl.setDescription(['mass_fl'])
combined= ot.BlockIndependentDistribution([fuel, mass_fl])
print(combined.getSample(5))
which gives you:
[ n-alkanes iso-alkanes cyclo-alkanes mass_fl ]
0 : [ 0.888138 0.0517162 0.060146 0.548626 ]
1 : [ 0.555712 0.141518 0.30277 0.419056 ]
2 : [ 0.772359 0.153878 0.0737633 0.747742 ]
3 : [ 0.564452 0.204366 0.231182 0.0667618 ]
4 : [ 0.607373 0.190895 0.201732 0.821981 ]
The SingularDirichlet class is fully correct if you use it to perform Monte Carlo simulations. If you plan to use more advanced methods such as QMC, FORM/SORM or any other method relying on an accurate evaluation of its CDF you will get an approximate result, equivalent to;
fuel_approximate = ot.UserDefined(fuel.getSample(Nmc))
I have ideas to implement an accurate evaluation of the CDF but it needs (much) more work.
Cheers
Régis