KernelSmoothing Warning

WinterSpark · July 7, 2022, 11:25am

Hello,
I allow myself to ask a question in this forum because i want to use the Kernel Smoothing method in order to build a generic PDF for my experimental data i used this statement to create the distribution:

loaded_sample = ot.Sample.ImportFromTextFile(path)
ks = ot.KernelSmoothing()
fitted_dist = ks.build(loaded_sample)

But doing this gives me an error that i never had when a used the kde method from scipy:
WRN - Warning! The distribution number 1023 has a too small weight=0 for a relative threshold equal to Mixture-SmallWeight=1e-12 with respect to the maximum weight=13.0606. It is removed from the collection.

I don’t really understand it, and it does this for 1023 points.

I hope i’ll find some help here.
Thank you very much
PS : I did not know in which language to write, so if you want to answer in french i understand

josephmure · September 27, 2022, 1:57pm

Hi @WinterSpark and sorry for the late answer. This warning is probably not important, but maybe you can tell us a little more about your dataset. What is its dimension? Are there repeated points in the dataset? If you could share it, we could perhaps try to reproduce the warning.

MichaelBaudin · February 14, 2023, 11:19pm

Hi!
I can reproduce the warnings with:

import openturns as ot
import openturns.viewer as otv

distribution = ot.Exponential()
size = 10000
sample = distribution.getSample(size)

factory = ot.KernelSmoothing()
fitted_distribution = factory.build(sample)
graph = fitted_distribution.drawPDF()
_ = otv.View(graph)

Regards,
Michaël

MichaelBaudin · June 10, 2023, 10:02pm

Hi !

This message is generated by this line:

github.com

openturns/openturns/blob/76249ef9eb0e9c869a58099309bccc8e39cf3106/lib/src/Uncertainty/Distribution/Mixture.cxx#L199


      
              // We throw an exception because the collection has distributions of different sizes
              throw InvalidArgumentException(HERE) << "Collection of distributions has distributions of different dimensions";
            Scalar w = weights[i];
            if (!(w >= 0.0)) throw InvalidArgumentException(HERE) << "Distribution " << i << " has a negative weight, w=" << w;
            if (w > maximumWeight) maximumWeight = w;
            weightSum += w;
          } /* end for */
          const Scalar smallWeight = ResourceMap::GetAsScalar("Mixture-SmallWeight") * maximumWeight;
          if (weightSum < smallWeight)
            // We throw an exception because the collection of distributions has only distributions with small weight: they cannot be renormalized
            throw InvalidArgumentException(HERE) << "Collection of distributions has atoms with too small total weight=" << weightSum << " for a threshold equal to Mixture-SmallWeight=" << smallWeight;
          // Second loop, keep only the atoms with a significant weight and update the sum
          weightSum = 0.0;
          distributionCollection_ = DistributionCollection(0);
          p_ = Point(0);
          isCopula_ = true;
          for(UnsignedInteger i = 0; i < size; ++i)
          {
            const Scalar w = weights[i];
            if (w < smallWeight)
            {

This is produced because the KernelSmoothing.build() method produces a Mixture where each atom is a KernelMixture. The number of atoms is equal to the number of bins.

github.com

openturns/openturns/blob/76249ef9eb0e9c869a58099309bccc8e39cf3106/lib/src/Uncertainty/Distribution/KernelSmoothing.cxx#L504


      
                weights[index + 1] += (x - grid[index])     / delta;
              }
              // The full weight is given to the end points
              else weights[index] += 1.0;
            }
            Collection< Distribution > atoms(binNumber_ + 1);
            for (UnsignedInteger i = 0; i <= binNumber_; ++i)
            {
              atoms[i] = KernelMixture(kernel_, bandwidth, Sample(1, Point(1, grid[i])));
            }
            Mixture result(atoms, weights);
            result.setDescription(sample.getDescription());
            return result;
          }
          
          
TruncatedDistribution KernelSmoothing::buildAsTruncatedDistribution(const Sample & sample,
              const Point & bandwidth) const
          {
            const UnsignedInteger dimension = sample.getDimension();
            if (bandwidth.getDimension() != dimension) throw InvalidDimensionException(HERE) << "Error: the given bandwidth must have the same dimension as the given sample, here bandwidth dimension=" << bandwidth.getDimension() << " and sample dimension=" << dimension;
            if (dimension > 1) throw InternalException(HERE) << "Error: cannot make boundary correction on samples with dimension>1, here dimension=" << dimension;

I assume that some kernels in the mixture have a very small weight. This might be because the sample size is so large that the contribution of some bins to the value of the PDF at some point x is negligible.

Is that correct @regislebrun?

@WinterSpark: what is the sample size that produces that message?

Regards,

Michaël

PS
The same question was asked at stackoverflow a few months ago.

sofianehaddad · June 19, 2023, 9:19pm

Hi all,

@MichaelBaudin and @josephmure are right. The message is produced by the above class (Mixture) and is not important (it is just a warning)
The weight of the kernel smoothing function are evaluated and passed to a Mixture distribution, that keep only significant coefficients (> 1e-12). In your example, a weight was smaller with respect to the threshold and is not kept in the collection. You have a collection of 1023 atoms instead of 1024
If you don’t want openturns to drop the coefficients, use ot.ResourceMap.SetAsScalar("ot.ResourceMap.GetAsScalar("Mixture-SmallWeight", 0.0) for example before performing the kernel smoothing.

efekhari27 · September 1, 2023, 3:32pm

Hi all,

I have also noticed that KS on samples with size above 1000 tend to struggle in OpenTURNS. For example, something very simple like this does not converge on my machine:

>>> import openturns as ot
>>> sample = ot.WeibullMin(1.3, 1.2).getSample(2000)
>>> ks = ot.KernelSmoothing()
>>> ks_weibull = ks.build(sample)

By increasing the value of the following default setting, the algorithm converges:

Default setting: ot.ResourceMap.SetAsUnsignedInteger('KernelSmoothing-BinNumber', 1024)

Do you think that it would be worth adjusting default settings depending on the sample size?

Best regards,
Elias

Topic		Replies	Views
Kernel smoothing of an extremely skewed distribution Methodology	1	74	May 2, 2024
Quality of student parameter estimation Python usage	2	292	June 22, 2022
Skew distributions in OT? Development	5	109	May 14, 2024
Conditional Sampling with Distinct Distributions Based on Sampled Values Python usage distribution	2	45	December 2, 2024
Discrete Distribution with Non-Numeric Values Methodology distribution	2	417	May 26, 2021

KernelSmoothing Warning

Related topics