Hi!
If you are at improving the lib on this topic, I suggest to use R. S. Stepleman and N. D. Winarsky algorithm:
Stepleman, R. S., and N. D. Winarsky. “Adaptive numerical differentiation.” Mathematics of Computation 33.148 (1979): 1257-1264.
https://www.jstor.org/stable/2006459
The algorithm uses a decreasing sequence of step sizes. It uses the property that the absolute value of successive finite differences is monotonic. The algorithm stops when the monotonic property fails.
"""
Adaptive numerical differentiation
Authors: R. S. Stepleman and N. D. Winarsky
Journal: Math. Comp. 33 (1979), 1257-1264
"""
import numpy as np
import openturns as ot
import openturns.viewer as otv
g = ot.SymbolicFunction(["x"], ["exp(-x / 1.e6)"])
def central_finite_difference(g, x, h):
x1 = x + h
x2 = x - h
y1 = g([x1])
y2 = g([x2])
g_fd = (y1 - y2) / (x1 + (- x2) ) # Magic trick?
# g_fd = (y1 - y2) / (2.0 * h)
return g_fd
x = 1.0
number_of_points = 1000
h_array = ot.Sample([[x] for x in np.logspace(-7.0, 5.0, number_of_points)])
error_array = ot.Sample(number_of_points, 1)
for i in range(number_of_points):
g_gradient = g.gradient([x])
h = h_array[i, 0]
g_fd = central_finite_difference(g, x, h)
error_array[i, 0] = abs(g_fd[0] - g_gradient[0, 0])
graph = ot.Graph("Finite difference", "h", "Error", True)
curve = ot.Curve(h_array, error_array)
graph.add(curve)
graph.setLogScale(ot.GraphImplementation.LOGXY)
otv.View(graph)
# Algorithm to detect h*
h0 = 1.e5
h_previous = h0
g_fd_previous = central_finite_difference(g, x, h_previous)
diff_previous = np.inf
for i in range(53):
h_current = h_previous / 2.0
g_fd_current = central_finite_difference(g, x, h_current)
diff_current = abs(g_fd_current[0] - g_fd_previous[0])
print("i=%d, h=%.4e, |FD(h_current) - FD(h_previous)=%.4e" % (
i, h_current, diff_current))
if diff_previous < diff_current:
print("Stop!")
break
g_fd_previous = g_fd_current
h_previous = h_current
diff_previous = diff_current
print("Optimum h=", h_current)
It seems to work quite well on the nontrivial exp example you gave:
i=0, h=5.0000e+04, |FD(h_current) - FD(h_previous)=1.2508e-09
i=1, h=2.5000e+04, |FD(h_current) - FD(h_previous)=3.1255e-10
i=2, h=1.2500e+04, |FD(h_current) - FD(h_previous)=7.8128e-11
i=3, h=6.2500e+03, |FD(h_current) - FD(h_previous)=1.9531e-11
i=4, h=3.1250e+03, |FD(h_current) - FD(h_previous)=4.8828e-12
i=5, h=1.5625e+03, |FD(h_current) - FD(h_previous)=1.2207e-12
i=6, h=7.8125e+02, |FD(h_current) - FD(h_previous)=3.0518e-13
i=7, h=3.9062e+02, |FD(h_current) - FD(h_previous)=7.6294e-14
i=8, h=1.9531e+02, |FD(h_current) - FD(h_previous)=1.9074e-14
i=9, h=9.7656e+01, |FD(h_current) - FD(h_previous)=4.7689e-15
i=10, h=4.8828e+01, |FD(h_current) - FD(h_previous)=1.1909e-15
i=11, h=2.4414e+01, |FD(h_current) - FD(h_previous)=2.9786e-16
i=12, h=1.2207e+01, |FD(h_current) - FD(h_previous)=7.7307e-17
i=13, h=6.1035e+00, |FD(h_current) - FD(h_previous)=1.8190e-17
i=14, h=3.0518e+00, |FD(h_current) - FD(h_previous)=9.0950e-18
i=15, h=1.5259e+00, |FD(h_current) - FD(h_previous)=1.8190e-17
Stop!
Optimum h= 1.52587890625
The paper describes how to choose the initial step size h_0 and the reduction factor \beta. However, I suggest to let the user choose h_0 and use \beta=2 as default, since the correct order of magnitude is sufficient in many cases. A \beta closer to 1 would allow a finer estimate of h^\star, but the extra evaluations are not worth in terms of improvement in accuracy of the gradient, given the cost in the evaluations of the function. At most a 52 / 2 = 26 evaluations should be required in many situations, provided a rough estimate of h_0 is provided.
I tried the trick you presented, but was not able to make it work: did I miss something? Why should it work?
Best regards,
Michaël