How to work with missing data
Data is missing in pandas when it has a value of NaN
(also seen as np.nan
- the form from NumPy). This NaN
value means that there is no value specified for the particular index label in a particular Series
.
How can data be missing? There are a number of reasons why a value can be NaN
:
- A join of two sets of data does not have matched values
- Data that you retrieved from an external source is incomplete
- The
NaN
value is not known at a given point in time and will be filled in later - There is a data collection error retrieving a value, but the event must still be recorded in the index
- Reindexing of data has resulted in an index that does not have a value
- The shape of data has changed and there are now additional rows or columns, which at the time of reshaping could not be determined
- There are likely more reasons, but the general point is that these situations do occur and you, as a user of pandas, will need to address these situations to be able to perform effective data...