Manipulating large heterogeneous tables with HDF5 and PyTables
PyTables can store homogeneous blocks of data as NumPy-like arrays in HDF5 files. It can also store heterogeneous tables, as we will see in this recipe.
Getting ready
You need PyTables for this recipe (see the previous recipe for installation instructions).
How to do it...
Let's import NumPy and PyTables:
In [1]: import numpy as np import tables as tb
Let's create a new HDF5 file:
In [2]: f = tb.open_file('myfile.h5', 'w')
We will create an HDF5 table with two columns: the name of a city (a string with 64 characters at most), and its population (a 32-bit integer). We can specify the columns by creating a complex data type with NumPy:
In [3]: dtype = np.dtype([('city','S64'), ('population', 'i4')])
Now, we create the table in
/table1
:In [4]: table = f.create_table('/', 'table1', dtype)
Let's add a few rows:
In [5]: table.append([('Brussels', 1138854), ('London', 8308369), ...