Empirical Bernstein copula on a weighted sample?

Hi everyone,

I was wondering if there is a way in OpenTURNS to fit an empirical Bernstein copula (EBC) using a weighted sample.

Let us consider two samples \mathbf{X}_{0, n}, \mathbf{X}_{1, n} (each in \mathbb{R}^d, with size n) independently generated after two well known distributions: \mathbf{X}_{0, n} \sim h_0 and \mathbf{X}_{1, n} \sim h_1. I would like to use the two samples to fit the copula associated with the distribution h_0.

Would it be legitimate to apply importance sampling weights to the sample \mathbf{X}_{1, n} and fit an EBC using the union of \mathbf{X}_{1, n} weighted and \mathbf{X}_{0, n}? And is there a way to use the ot.EmpiricalBernsteinCopula class to do so?

Thanks in advance for your answers!
Elias

Hi Elias, for now the only way to do that I see is to repeat the points with larger weight in the sample. That is because the EmpiricalBernsteinCopula constructor requires a Sample, whereas a weighted sample would be represented in the library by a WeightedExperiment.

Hi Joseph,

Thanks for your answer. Working with repetitions works perfectly to emulate weights even if it’s probably not optimal numerically.

Best,
Elias

In addition to Joseph’s excellent answer (the repetition of points according to their weights), I would like to confirm that the approach you describe is perfectly sounded. With w_i=h_0(\mathbf{X}_{1,n}^i)/h_1(\mathbf{X}_{1,n}^i), the weighted sample (w,\mathbf{X}_{1,n}) is distributed according to h_0.

Unfortunately we don’t have the concept of weighted sample in OT yet. We manipulate separately the sample and the weights, as produced e.g by a WeightedExperiment (which is indeed a weighted sample generator).

The current implementation of EmpiricalBernsteinCopula relies heavily on the uniform weights. It could be adapted to nonuiform weights (and BernsteinCopulaFactory too) but the cost in terms of performance (sampling, PDF/CDF computation) will probably be significant. It can be added to the whish list on github with a short description of the context, in particular the dimension and the size of the samples you want to use.