Skew distributions in OT?

Romain · March 28, 2024, 3:54am

Hello,

First, warm congrats and thanks for the huge effort in building such an exhaustive and advanced open-source library for uncertainty modelling. Very impressive and super helpful to the community.

I have browsed the documentation but I haven’t found mentions of skew distributions modeled in OT. I am particularly interested in univariate Gaussian skew and T-skew distributions and multivariate Gaussian skew and T-skew distributions.

They are relatively standard and extensively studied by Azzalini and Della Valle (for instance: http://azzalini.stat.unipd.it/SN/)

I wonder if there is a way to include them in the OT distributions and use the nice ot.FittingTest. set of criteria and functions to include them into the list of candidates distributions.

Thank you very much for your help and best regards

Romain

regislebrun · April 3, 2024, 12:59pm

Hi,

I had a look at the sn R package. These distributions are very interesting and should definitely be part of the OpenTURNS library, but we are currently short of resources to implement them rapidly. It will be difficult to add them for the next release as it will be frozen soon. Is it ok for you if you get them for the next release (~november)?

Cheers

Régis

Romain · April 5, 2024, 2:15am

Dear Regis,

Thank you very much for your prompt and kind response.

Absolutely no problems, I fully understand the constraints and I am very grateful that you put the skew distribution on your November release schedule.

Thank you very much again

Best regards,

Romain

MichaelBaudin · May 2, 2024, 1:44pm

Hi!
Thank you for this suggestion of development.
In the meantime, notice that you have several workarounds available.

You can implement your own SN distribution using PythonDistribution. This can be efficient, depending on your needs. You only need to implement the CDF and the range to get a working distribution. If you can implement any other methods available, e.g. the PDF or the LogPDF, then the library can take that into account.
You can convert any SciPy distribution into OpenTURNS using SciPyDistribution. The same is true with the ChaospyDistribution.

I hope that this may help you.
Regards,
Michaël

Romain · May 9, 2024, 1:48am

Thank you so much Michael for the pointers on how to implement custom distribution with OpenTURNS framework.

From my (limited) understanding, it seems that ot.SciPyDistribution implements a Distribution object but not a distribution factory object. Ideally, I would like to fit skewed distributions using ot.FittingTest.BestModelBIC. Is it something feasible using a distribution imported from ot.SciPyDistribution or coded from ot.PythonDistribution ?

Thank you very much for your time and help.

Best regards,

Romain

regislebrun · May 14, 2024, 8:43pm

You can use one of the generic factories with a SciPyDistribution. Be careful! You have to create your model (ie an instance of the distribution you want to fit) using unnamed arguments otherwise they will not be counted as parameters at the OpenTURNS level. See the following example for clarification:

import openturns as ot
import openturns.viewer as otv
import scipy.stats as st
import numpy as np

np.random.seed(seed=1234)

# Activate the logs if you want to observe the convergence process
# ot.Log.Show(ot.Log.ALL)

# Create a known skewnormal from scipy, to generate a sample
py_dist = ot.SciPyDistribution(st.skewnorm(2.0, 1.5, 2.5))
distribution = ot.Distribution(py_dist)

size = 250
sample = distribution.getSample(size)

# Fit a skewnormal using the maximum likelihood factory
# Here the 3 parameters will be fitted
model = ot.Distribution(ot.SciPyDistribution(st.skewnorm(1, 1, 1)))
factory = ot.MaximumLikelihoodFactory(model)
opt = ot.Cobyla()
opt.setMaximumCallsNumber(100000)
factory.setOptimizationAlgorithm(opt)
res3 = factory.build(sample)
print("Fitted parameters=", res3.getParameter())

# Here, only the first parameter will be fitted
model = ot.Distribution(ot.SciPyDistribution(st.skewnorm(1, loc=1.5, scale=2.5)))
factory = ot.MaximumLikelihoodFactory(model)
factory.setOptimizationAlgorithm(opt)
res1 = factory.build(sample)
print("Fitted parameters=", res1.getParameter())

# Draw PDFs
g = distribution.drawPDF()
c = res3.drawPDF().getDrawable(0)
c.setLineStyle("dashed")
g.add(c)
c = res1.drawPDF().getDrawable(0)
c.setLineStyle("dashed")
g.add(c)
g.setLegends(["model", "fitted (3 params)", "fitted (1 param)"])
view = otv.View(g)
view.save("Fitted_scipy_dist.png")
view.close()

and you get:

Fitted parameters= [2.31794,1.37217,2.48509]
Fitted parameters= [2.13504]

with the following graphical comparison:
Fitted_scipy_dist

The MaximumLikelihoodFactory class is not very robust (e.g some parameters may need additional constraints), and Cobyla may need quite a lot of logPDF evaluations before to converge. You can try other generic factories and see if you get better results (LeastSquaresDistributionFactory, MethodOfMomentsFactory, QuantileMatchingFactory).

Cheers

Régis

Topic		Replies	Views
Discrete Distribution with Non-Numeric Values Methodology distribution	2	415	May 26, 2021
Quality of student parameter estimation Python usage	2	292	June 22, 2022
OpenTURNS ships 1.24 RC1 Announcements	0	276	May 6, 2024
OpenTURNS 1.24 released Announcements	0	73	November 26, 2024
Discrete uniform distribution Development distribution	3	492	January 7, 2021

Skew distributions in OT?

Related topics