Memory issues for MCMC in high dimension

MerlinKeller · December 17, 2024, 3:41pm

Hi to all Open Turners !

I have been trying to perform Bayesian calibration with a high dimensional parameter (several thousand) using the MCMC classes but ran in what seems to be a memory overflow problem, which causes the the Python kernel to crash.

Here is a sample code reproducing the problem:

dim = 10000
nsimu = 10000
blocksize = 10
initial_state = ot.Point([0.0]*dim)
samplers = [ot.RandomVectorMetropolisHastings( ot.RandomVector( ot.Normal() ), initial_state, [k] ) for k in range(dim)]
Gibbs = ot.Gibbs(samplers)
samples = []
nblocks = nsimu // blocksize
for i in range( nblocks ):
    print( "%s blocks out of %s"%(i*nblocks, nsimu) )
    samples.append( Gibbs.getSample(blocksize) )

The cause of the problem is quite simple: every MetropolisHastings object passed to the Gibbs constructor stores the history of all the visited states of the Markov chain in the working memory. In the example, each iteration generates hundreds of millions of floats, quickly leading to an overflow.

The only practical solution I see would be to periodically save the current state of the chain (and adaptation factors in the case of RandomWalkMetropolisHastings objects), then re-define the MH classes from scratch, thereby removing the history of the chains, but this seems tedious.

Do you have any ideas of suggestion dealing with this issue ?

MerlinKeller · December 18, 2024, 1:21pm

Re,

Writing down the test code helped me find a simple answer which I share with you: when initializing the MetropolisHastings objects, simply define the associated HistoryStrategy to null, minimizing the memory load of the thread, as shown below:

import openturns as ot

dim = 10000
nsimu = 10000
blocksize = 10

initial_state = ot.Point([0.0]*dim)

samplers = []
for k in range(dim):
    sampler = ot.RandomVectorMetropolisHastings( ot.RandomVector( ot.Normal() ), initial_state, [k] )
    sampler.setHistory( ot.HistoryStrategy() )
    samplers.append(sampler)

Gibbs = ot.Gibbs(samplers)

samples = []
nblocks = nsimu // blocksize
for i in range( nblocks ):
    # break
    print( "%s blocks out of %s"%(i*nblocks, nsimu) )
    samples.append( Gibbs.getSample(blocksize) )

Topic		Replies	Views
Problem while using RandomWalkMetropolisHastings Python usage	1	385	March 25, 2022
MCMC proposal distribution error Python usage calibration	2	61	October 24, 2024
Version 1.19 released Announcements	0	404	May 10, 2022
Version 1.19 RC1 released Announcements	0	370	April 14, 2022
How should i memoize my function in a calibration setting? Python usage calibration	4	403	May 25, 2021

Memory issues for MCMC in high dimension

Related topics