Sobol Sensitivity analysis of Machine Learning Model

Hi!
I am trying to do the Second order Sensitivity analysis of a Gradient Boosted model from SKLearn package. However the OT is giving this msg.

model = ot.SymbolicFunction(feature_list,[GB1(feature_list)])
‘GradientBoostingRegressor’ object is not callable

Kindly help

with regards
Saurav

Hi,
Indeed GradientBoosting are not analytical function and thus could not be casted like that.
You need to rely on a PythonFunction. Here an example for that purpose:

import openturns as ot
import numpy as np

class SklearnPyFunction(ot.OpenTURNSPythonFunction):
    """
    Define a OpenTURNS Function using Machine learning algorithms from scikit.
    Parameters
    ----------
    algo : a scikit algo
        Algo for response surface, already trained/validated
    in_dim : int
        Input dimension
    out_dim: int
        Output dimension
    """
    def __init__(self, algo, in_dim, out_dim):
        super(SklearnPyFunction, self).__init__(in_dim, out_dim)
        self.algo = algo

    def _exec(self, x):
        X = np.reshape(x, (1, -1))
        return self.algo.predict(X).ravel()

    def _exec_sample(self, x):
        X = np.array(x)
        size = len(X)
        return self.algo.predict(X).reshape(size, self.getOutputDimension())

class GradientBoosting(ot.Function):
    """
    Define an OpenTURNS Function using sklearn algorithms
    Parameters
    ----------
    algo : a scikit algo
        Algo for response surface, already trained/validated
    in_dim : int
        Input dimension
    out_dim: int
        Output dimension
    """
    def __new__(self, algo, in_dim, out_dim):
        python_function = SklearnPyFunction(algo, in_dim, out_dim)
        return ot.Function(python_function)

As an example:

    import openturns as ot
    from sklearn.ensemble import GradientBoostingRegressor
    size = 10
    model = ot.SymbolicFunction("x", "(1.0 + sign(x)) * cos(x) - (sign(x) - 1) * sin(2*x)")
    dataX = ot.Uniform().getSample(size)
    dataY = model(dataX)
    algo = GradientBoostingRegressor()
    algo.fit(dataX, dataY)
    f = GradientBoosting(algo, 1, 1)
    print(f(dataX))

Hope this helps

BR
Sofiane

Dear Sir

Thank you very much for your kind response. Now I am able to run the sensitivity analysis. My data set has six input parameters which are following log normal distribution (As they can not have a negative value). The input parameters are correlated so I am trying trying to run the sensitivity analysis based on ANCOVA indices. However, with the analysis, I am getting Negative and very small ANCOVA indices for all the parameters with following message.

ANCOVA indices [-0.00427258,0.0114251,0.00705848,0.00992637,0.010744,0.00159234]
ANCOVA uncorrelated indices [0.000100539,0.000217306,0.00039698,0.00229071,0.000511348,0.000109577]
ANCOVA correlated indices [-0.00437312,0.0112077,0.0066615,0.00763566,0.0102326,0.00148276]

I am not able to infer the results from this data. Kindly help

Saurav