Handling missing data in a pandas DataFrame
In this section, we will be looking at how we can handle missing data in a pandas DataFrame. We have a few ways of detecting missing data that work for both series and DataFrames. We could use NumPy's isnan
function; we could also use the isnull
or notnull
method supplied with series and DataFrames for detection. NaN detection could be useful for custom approaches for handling missing information.
In this Notebook, we're going to look at ways of managing missing information. First we generate a DataFrame containing missing data, illustrated in the following screenshot:

As mentioned before in pandas, missing information is encoded by NumPy's NaN. This is, obviously, not necessarily how missing information is encoded everywhere. For example, in some surveys, missing data is encoded by an impossible numeric value. Say, the number of children the mother has is 999; this is obviously not correct. This is an example of using a sentinel value to indicate...