Hi!

If you are at improving the lib on this topic, I suggest to use R. S. Stepleman and N. D. Winarsky algorithm:

Stepleman, R. S., and N. D. Winarsky. “Adaptive numerical differentiation.” *Mathematics of Computation* 33.148 (1979): 1257-1264.

https://www.jstor.org/stable/2006459

The algorithm uses a decreasing sequence of step sizes. It uses the property that the absolute value of successive finite differences is monotonic. The algorithm stops when the monotonic property fails.

```
"""
Adaptive numerical differentiation
Authors: R. S. Stepleman and N. D. Winarsky
Journal: Math. Comp. 33 (1979), 1257-1264
"""
import numpy as np
import openturns as ot
import openturns.viewer as otv
g = ot.SymbolicFunction(["x"], ["exp(-x / 1.e6)"])
def central_finite_difference(g, x, h):
x1 = x + h
x2 = x - h
y1 = g([x1])
y2 = g([x2])
g_fd = (y1 - y2) / (x1 + (- x2) ) # Magic trick?
# g_fd = (y1 - y2) / (2.0 * h)
return g_fd
x = 1.0
number_of_points = 1000
h_array = ot.Sample([[x] for x in np.logspace(-7.0, 5.0, number_of_points)])
error_array = ot.Sample(number_of_points, 1)
for i in range(number_of_points):
g_gradient = g.gradient([x])
h = h_array[i, 0]
g_fd = central_finite_difference(g, x, h)
error_array[i, 0] = abs(g_fd[0] - g_gradient[0, 0])
graph = ot.Graph("Finite difference", "h", "Error", True)
curve = ot.Curve(h_array, error_array)
graph.add(curve)
graph.setLogScale(ot.GraphImplementation.LOGXY)
otv.View(graph)
# Algorithm to detect h*
h0 = 1.e5
h_previous = h0
g_fd_previous = central_finite_difference(g, x, h_previous)
diff_previous = np.inf
for i in range(53):
h_current = h_previous / 2.0
g_fd_current = central_finite_difference(g, x, h_current)
diff_current = abs(g_fd_current[0] - g_fd_previous[0])
print("i=%d, h=%.4e, |FD(h_current) - FD(h_previous)=%.4e" % (
i, h_current, diff_current))
if diff_previous < diff_current:
print("Stop!")
break
g_fd_previous = g_fd_current
h_previous = h_current
diff_previous = diff_current
print("Optimum h=", h_current)
```

It seems to work quite well on the nontrivial exp example you gave:

```
i=0, h=5.0000e+04, |FD(h_current) - FD(h_previous)=1.2508e-09
i=1, h=2.5000e+04, |FD(h_current) - FD(h_previous)=3.1255e-10
i=2, h=1.2500e+04, |FD(h_current) - FD(h_previous)=7.8128e-11
i=3, h=6.2500e+03, |FD(h_current) - FD(h_previous)=1.9531e-11
i=4, h=3.1250e+03, |FD(h_current) - FD(h_previous)=4.8828e-12
i=5, h=1.5625e+03, |FD(h_current) - FD(h_previous)=1.2207e-12
i=6, h=7.8125e+02, |FD(h_current) - FD(h_previous)=3.0518e-13
i=7, h=3.9062e+02, |FD(h_current) - FD(h_previous)=7.6294e-14
i=8, h=1.9531e+02, |FD(h_current) - FD(h_previous)=1.9074e-14
i=9, h=9.7656e+01, |FD(h_current) - FD(h_previous)=4.7689e-15
i=10, h=4.8828e+01, |FD(h_current) - FD(h_previous)=1.1909e-15
i=11, h=2.4414e+01, |FD(h_current) - FD(h_previous)=2.9786e-16
i=12, h=1.2207e+01, |FD(h_current) - FD(h_previous)=7.7307e-17
i=13, h=6.1035e+00, |FD(h_current) - FD(h_previous)=1.8190e-17
i=14, h=3.0518e+00, |FD(h_current) - FD(h_previous)=9.0950e-18
i=15, h=1.5259e+00, |FD(h_current) - FD(h_previous)=1.8190e-17
Stop!
Optimum h= 1.52587890625
```

The paper describes how to choose the initial step size h_0 and the reduction factor \beta. However, I suggest to let the user choose h_0 and use \beta=2 as default, since the correct order of magnitude is sufficient in many cases. A \beta closer to 1 would allow a finer estimate of h^\star, but the extra evaluations are not worth in terms of improvement in accuracy of the gradient, given the cost in the evaluations of the function. At most a 52 / 2 = 26 evaluations should be required in many situations, provided a rough estimate of h_0 is provided.

I tried the trick you presented, but was not able to make it work: did I miss something? Why should it work?

Best regards,

Michaël