Skew distributions in OT?

Hello,

First, warm congrats and thanks for the huge effort in building such an exhaustive and advanced open-source library for uncertainty modelling. Very impressive and super helpful to the community.

I have browsed the documentation but I haven’t found mentions of skew distributions modeled in OT. I am particularly interested in univariate Gaussian skew and T-skew distributions and multivariate Gaussian skew and T-skew distributions.

They are relatively standard and extensively studied by Azzalini and Della Valle (for instance: http://azzalini.stat.unipd.it/SN/)

I wonder if there is a way to include them in the OT distributions and use the nice ot.FittingTest. set of criteria and functions to include them into the list of candidates distributions.

Thank you very much for your help and best regards

Romain

Hi,

I had a look at the sn R package. These distributions are very interesting and should definitely be part of the OpenTURNS library, but we are currently short of resources to implement them rapidly. It will be difficult to add them for the next release as it will be frozen soon. Is it ok for you if you get them for the next release (~november)?

Cheers

Régis

Dear Regis,

Thank you very much for your prompt and kind response.

Absolutely no problems, I fully understand the constraints and I am very grateful that you put the skew distribution on your November release schedule.

Thank you very much again

Best regards,

Romain

Hi!
Thank you for this suggestion of development.
In the meantime, notice that you have several workarounds available.

  • You can implement your own SN distribution using PythonDistribution. This can be efficient, depending on your needs. You only need to implement the CDF and the range to get a working distribution. If you can implement any other methods available, e.g. the PDF or the LogPDF, then the library can take that into account.
  • You can convert any SciPy distribution into OpenTURNS using SciPyDistribution. The same is true with the ChaospyDistribution.

I hope that this may help you.
Regards,
Michaël

Thank you so much Michael for the pointers on how to implement custom distribution with OpenTURNS framework.

From my (limited) understanding, it seems that ot.SciPyDistribution implements a Distribution object but not a distribution factory object. Ideally, I would like to fit skewed distributions using ot.FittingTest.BestModelBIC. Is it something feasible using a distribution imported from ot.SciPyDistribution or coded from ot.PythonDistribution ?

Thank you very much for your time and help.

Best regards,

Romain

You can use one of the generic factories with a SciPyDistribution. Be careful! You have to create your model (ie an instance of the distribution you want to fit) using unnamed arguments otherwise they will not be counted as parameters at the OpenTURNS level. See the following example for clarification:

import openturns as ot
import openturns.viewer as otv
import scipy.stats as st
import numpy as np

np.random.seed(seed=1234)

# Activate the logs if you want to observe the convergence process
# ot.Log.Show(ot.Log.ALL)

# Create a known skewnormal from scipy, to generate a sample
py_dist = ot.SciPyDistribution(st.skewnorm(2.0, 1.5, 2.5))
distribution = ot.Distribution(py_dist)

size = 250
sample = distribution.getSample(size)

# Fit a skewnormal using the maximum likelihood factory
# Here the 3 parameters will be fitted
model = ot.Distribution(ot.SciPyDistribution(st.skewnorm(1, 1, 1)))
factory = ot.MaximumLikelihoodFactory(model)
opt = ot.Cobyla()
opt.setMaximumCallsNumber(100000)
factory.setOptimizationAlgorithm(opt)
res3 = factory.build(sample)
print("Fitted parameters=", res3.getParameter())

# Here, only the first parameter will be fitted
model = ot.Distribution(ot.SciPyDistribution(st.skewnorm(1, loc=1.5, scale=2.5)))
factory = ot.MaximumLikelihoodFactory(model)
factory.setOptimizationAlgorithm(opt)
res1 = factory.build(sample)
print("Fitted parameters=", res1.getParameter())

# Draw PDFs
g = distribution.drawPDF()
c = res3.drawPDF().getDrawable(0)
c.setLineStyle("dashed")
g.add(c)
c = res1.drawPDF().getDrawable(0)
c.setLineStyle("dashed")
g.add(c)
g.setLegends(["model", "fitted (3 params)", "fitted (1 param)"])
view = otv.View(g)
view.save("Fitted_scipy_dist.png")
view.close()

and you get:

Fitted parameters= [2.31794,1.37217,2.48509]
Fitted parameters= [2.13504]

with the following graphical comparison:
Fitted_scipy_dist

The MaximumLikelihoodFactory class is not very robust (e.g some parameters may need additional constraints), and Cobyla may need quite a lot of logPDF evaluations before to converge. You can try other generic factories and see if you get better results (LeastSquaresDistributionFactory, MethodOfMomentsFactory, QuantileMatchingFactory).

Cheers

Régis