Wilks quantile in OpenTURNS

hello!
i’ve had some trouble using the Wilks method in OpenTURNS. It actually works on a RandomVector object, whereas i already had my Sample over which i wanted the calculation to be performed.

One workaround would be to create a RandomVector object by applying a constant function over my sample. But this seems to be a lot of trouble and i would think that the wilks method should be able to handle sample objects.

thanks!
sanaa

Indeed Wilks works the other way around; you specify the quantile level and the confidence interval and it tells you the sample size you need (using the Wilks.ComputeSampleSize(a,b,i) method).
If you want to reuse a sample you could try to tweak the confidence interval until it matches your sample size (unless there’s a direct way of computing this ?).

Then you would just need to do what Wilks.computeQuantileBound does:

 sample.sort(0)[-i]

Hi Sanaa,

I don’t understand the computation you want to perform with your sample. Basically, Wilk’s method could tell you the quantile level you get for a given confidence level if you take the largest value in your sample (or the second largest value aso), or it could tell you the confidence level you get for a given quantile level if you take the largest value in your sample (or the second largest value aso), or the maximum margin you can take to get an estimate of the quantile of a given quantile level for a given confidence level. As you see, the four quantities (sample size, margin, quantile level, confidence level) are linked and you may want to get any of them given the three others. The current implementation is to give you the sample size given the other parameters, but it is not so difficult to implement the 3 other combinations.

Hi to all! I stumbled upon this conversation because I was facing the exact same issue as Sanaa: I have a sample with a given size, and I need to compute the Wilks quantile… There are two solutions to date:

  1. Use the openturns.wilks() class, but:
  • It is a shame the constructor takes only RandomVectors, given that Samples is such a commun usecase;

  • the Wilks.computeQuantileBound “marginIndex” keyword argument takes 0 as a default value, which is hard to comprehend. Indeed, since it is described as

Rank of the maximum which will evaluate the empirical quantile

a more logical default value would have been int(alpha*n), where:

  • alpha is the quantile order, already an input of the computeQuantileBound() method
  • n is the sample size.
  1. Code your own inverse of Wilks_ComputeSampleSize(alpha, beta, i), to determine, given alpha, beta and a sample size n, what is the rank i of the order statistic evaluating the Wilks quantile. In my opinion, this is a not entirely trivial task. Here is my solution:
def Wilks_ComputeIndex(p, beta, N):
    i = 0
    n_of_i = ot.Wilks_ComputeSampleSize(p, beta, i)
    if n_of_i > n:
        raise NameError('Insufficient sample size')
    while n_of_i < N:
        i += 1
        n_of_i = ot.Wilks_ComputeSampleSize(p, beta, i)
        # print('i=%s, minimal size=%s'%(i,n_of_i))
    if n_of_i > n:
        i -= 1
    return i

No big deal in either case, but the implementation could be a great deal more user-friendly with a few ameliorations / automatization.

Cheers,
Merlin

It’s Open Source, feel free to contribute ;-)!

1 Like

More seriously, the marginIndex argument is here to indicate how many upper statistics you want to put at the right of your estimate. If it is equal to zero, then you consider the max of the sample as your quantile estimate, if it is equal to 1, the second largest value, and so on. It was defined during the CEMRACS 2006 when OpenTURNS was not even publicly released, so it is quite an old piece of code. And yes, the description is not correct. Concerning the constructor, the objective of this class was to provide a sampler of Wilk’s upper bound, hence the choice of a random vector. If you already have a sample at hand, it is too late: what is the interest of computing a sample size in this case? But if you provide an actual use-case for this class I can move things to ease your job :slight_smile:.

Concerning your suggestion, it CANNOT work except for beta=0.5. If you link the rank of the order statistics to take as an estimate of a quantile upper bound to the quantile level, then the confidence level is fixed! In your case, if you choose the rank of the empirical quantile as the rank of your quantile estimate, you will get an asymptotic confidence of 0.5 as this estimator is consistent and asymptotically Gaussian (thus symmetrical).

I started to see what appends in this class. Here is what I get (the OT output is in blue, the analytical result for the maximum is in orange). So everything is good except for the low confidence levels. I suspect the initialization of the sequential search, not the easiest part of the code…Figure_1

I am surely considering doing so!

Thanks for the additional feedback, though I’m not really sure what the figure is about? It seems that we have failed with Sanaa to make clear the use-case we have in mind, so let me try again:

Suppose you have a sample of size N, based on which you want to estimate the p-th order quantile of the underlying distribution. This is handily done by computing the empirical quantile, that is, the rank [N*p] order statistic (if [x] stands for “the integer closest to x”). The Wilks quantile allows to determine a second rank i for the order statistics, which will yield an upper bound with confidence level beta (given that the sample size is not too small). Our problem is: how to determine the corresponding rank i?

The current implementation gives us the reverse information: if I use the rank i order statistic as an upper confidence bound for the p-th order quantile, what is the minimal sample size such that the confidence level is at least beta? This is a natural question if you want to plan a study, but not if you want to compute that actual Wilks quantile on a given sample…

(Just as a reminder : In order to format your source code, you can “triple backquote” your Python code. )

Hi @MerlinKeller @sanaaZ,
Would you be able to provide a use-case as a Python script showing the use of the feature you would need?
Of course, this script will not work, but this does not matter at the current stage of the development.
Regards,
Michaël

Hi Michaël,

Sorry for an overly belated answer!!

Here is an example script where I compute on a simulated sample:

  • the empirical quantile
  • the Wilks quantile

Wilks_quantile_exemple_script.py (1.1 KB)

In it I also suggest to added to the Sample class a method computeWilksQuantile, in analogy to the existing computeEmpiricalQuantile.

Best,
Merlin