Kolmogorov-Smirnov test in OT

sanaaZ · January 7, 2021, 1:26pm

hello,
This might be a naive question but i was surprised to find that KS test in OT does not show a “deterministic” behavior. The resulting p-value is sometimes big enough, sometimes very close to 0, for a given sample and a given theoretical distributionFactory object.

am i missing something here ? or is it just some numerical approximation thing ?

thanks,

sanaa

regislebrun · January 7, 2021, 3:53pm

Hi,
You will get a lot of information here on this topic. To make a long story short, the exact p-value is estimated using a Monte Carlo method in the case where some parameters are estimated and not readily known. It was the idea of Lilliefors for the normal distribution, and it has been extended (and implemented this way) e.g. in the matlab statistical toolbox.

Cheers

Régis

sanaaZ · January 7, 2021, 3:57pm

great! and sorry for not taking the time to check if the issue had popped up before

regislebrun · January 7, 2021, 4:06pm

No problem ;-)! This platform is here to exchange information, and it is exactly what we do!

MichaelBaudin · January 12, 2021, 11:04am

Hi !

Please do not feel sorry about asking questions and getting already answered answers, as the forum is designed just for this purpose. What is unwanted is unkind messages, a category in which your message does not fall!

There is a series of examples in the doc which presents the topic and the issue when parameters are estimated.

The principles are presented here

The theory is described here http://openturns.github.io/openturns/master/theory/data_analysis/kolmogorov_test.html

Notice that the one of the examples has a bug (the KS statistics is wrongly drawn), which is identified and fixed here https://github.com/openturns/openturns/pull/1701

Best regards,

Michaël

PS
Notice that there is a third case which is not presented in the doc: the parameters are estimated from the sample, but the user wrongly use the Kolmogorov class:

dist = ot.NormalFactory().build(data)
test_result = ot.FittingTest.Kolmogorov(data, dist)

This could be used to improve the speed, as presented in https://github.com/openturns/openturns/issues/1061 : Denote by p_1 the p-value evaluated assuming that the parameters are known. The p-value p_1 is fast to compute. Denote p2 the p-value assuming that the parameters are estimated. We always have p_2<p_1.

Topic		Replies	Views
New otbenchmark module Python usage	0	391	September 25, 2020
Skew distributions in OT? Development	5	104	May 14, 2024
How to evaluate the sampling methods Methodology sampling	7	408	March 29, 2023
OTbenchmark: a benchmark module for UQ available on PyPi! Announcements	2	386	February 23, 2021
Quality of student parameter estimation Python usage	2	286	June 22, 2022

Kolmogorov-Smirnov test in OT

Related topics