Processing large NumPy arrays with memory mapping
Sometimes, we need to deal with NumPy arrays that are too big to fit in the system memory. A common solution is to use memory mapping and implement out-of-core computations. The array is stored in a file on the hard drive, and we create a memory-mapped object to this file that can be used as a regular NumPy array. Accessing a portion of the array results in the corresponding data being automatically fetched from the hard drive. Therefore, we only consume what we use.
How to do it...
Let's create a memory-mapped array in write mode:
>>> import numpy as np >>> nrows, ncols = 1000000, 100 >>> f = np.memmap('memmapped.dat', dtype=np.float32, mode='w+', shape=(nrows, ncols))
Let's feed the array with random values, one column at a time because our system's memory is limited!
>>> for i in range(ncols): f[:, i] = np.random.rand(nrows)
We save the last column of the array:
>>> x = f...