KernelSmoothing Warning

Hello,
I allow myself to ask a question in this forum because i want to use the Kernel Smoothing method in order to build a generic PDF for my experimental data i used this statement to create the distribution:

loaded_sample = ot.Sample.ImportFromTextFile(path)
ks = ot.KernelSmoothing()
fitted_dist = ks.build(loaded_sample)

But doing this gives me an error that i never had when a used the kde method from scipy:
WRN - Warning! The distribution number 1023 has a too small weight=0 for a relative threshold equal to Mixture-SmallWeight=1e-12 with respect to the maximum weight=13.0606. It is removed from the collection.

I don’t really understand it, and it does this for 1023 points.

I hope i’ll find some help here.
Thank you very much
PS : I did not know in which language to write, so if you want to answer in french i understand

Hi @WinterSpark and sorry for the late answer. This warning is probably not important, but maybe you can tell us a little more about your dataset. What is its dimension? Are there repeated points in the dataset? If you could share it, we could perhaps try to reproduce the warning.

Hi!
I can reproduce the warnings with:

import openturns as ot
import openturns.viewer as otv

distribution = ot.Exponential()
size = 10000
sample = distribution.getSample(size)

factory = ot.KernelSmoothing()
fitted_distribution = factory.build(sample)
graph = fitted_distribution.drawPDF()
_ = otv.View(graph)

Regards,
Michaël

Hi !

This message is generated by this line:

This is produced because the KernelSmoothing.build() method produces a Mixture where each atom is a KernelMixture. The number of atoms is equal to the number of bins.

I assume that some kernels in the mixture have a very small weight. This might be because the sample size is so large that the contribution of some bins to the value of the PDF at some point x is negligible.

Is that correct @regislebrun?

@WinterSpark: what is the sample size that produces that message?

Regards,

Michaël

PS
The same question was asked at stackoverflow a few months ago.

Hi all,

@MichaelBaudin and @josephmure are right. The message is produced by the above class (Mixture) and is not important (it is just a warning)
The weight of the kernel smoothing function are evaluated and passed to a Mixture distribution, that keep only significant coefficients (> 1e-12). In your example, a weight was smaller with respect to the threshold and is not kept in the collection. You have a collection of 1023 atoms instead of 1024
If you don’t want openturns to drop the coefficients, use ot.ResourceMap.SetAsScalar("ot.ResourceMap.GetAsScalar("Mixture-SmallWeight", 0.0) for example before performing the kernel smoothing.

1 Like

Hi all,

I have also noticed that KS on samples with size above 1000 tend to struggle in OpenTURNS. For example, something very simple like this does not converge on my machine:

>>> import openturns as ot
>>> sample = ot.WeibullMin(1.3, 1.2).getSample(2000)
>>> ks = ot.KernelSmoothing()
>>> ks_weibull = ks.build(sample)

By increasing the value of the following default setting, the algorithm converges:

Default setting: ot.ResourceMap.SetAsUnsignedInteger('KernelSmoothing-BinNumber', 1024)

Do you think that it would be worth adjusting default settings depending on the sample size?

Best regards,
Elias