Screening with HSIC coeff

Hello !

I am trying to perform screening with HSIC coeff, based on results already done (DOE such as LHS…)
I am wondering :

  • which estimatorType to use : HSICVStat or HSICUStat (I tried both)
  • for covarianceModelCollection I used SquaredExponential for each input / output parameters

my first results with HSIC coeff confirm sensitivity analyses performed with Sobol indices

With another study case, one of my student performed a Morris screening but it seems difficult to analyze, So we chose to run another DOE in order to compute HSIC coeff and we hope to get an easier analyze to sort influential / non influential input parameters

Thanks in advance for your advise!

Flore

Hello Flore,

I apologize if this reply comes way too late. But just in case, here’s a few notes :

  • Please note that one of the underlying hypotheses required to perform HSIC based analyses is that the considered sample is IID. Therefore, an optimized LHS (which does not comply with said hypothesis) will result in partially biased results, difficult to assess.
  • There is no clear preference regarding which estimator to use in the litterature. In general, I tend to use the V-stat one as it is more commonly used in the related papers, and therefore feels a little bit more robust, but both should provide you with very similar results, unless you are working with extremely small samples.
  • The squared exponential kernel with the empirical parameterization proposed in the documentation (The HSIC sensitivity indices: the Ishigami model — OpenTURNS 1.19 documentation) is, to my knowledge, a fairly robust option. Be careful not to forget to specifiy a different lengthscale parameter for every input and output dimension.

Best regards,
Julien

1 Like

Hello Flore,

First of all, I would like to apologize for this late answer to this quite old post.

However, I would like to provide a few additional remarks to Julien’s answer.

As he said, performing HSIC-based screening should rely on an iid sample. If this is not the case, to my understanding, you can still use HSIC estimators for, e.g., space-filling designs. However, the problem arises when you decide to perform statistical tests to check whether inputs have a significant influence or not. In this case, there’s no guarantee on the level of the test, which does not allow you to properly conclude in the screening task.

Recent works tried to propose solutions to this problem, especially in the case of space-filling designs (see here: https://hal-cea.archives-ouvertes.fr/cea-03406956/), but we do not have these solutions implemented yet in the OT software. If you really need them, please, tell us and we will try to do our best.

Regards,
Vincent