Importing the dataset
Before we load the dataset, there are some important facts about the data that must be acknowledged:
- The data is in a fixed-width format, meaning that there is no delimiter. Column widths will have to be specified manually.
- There is no header row that has column names.
- If you were to open the data file using a text editor, you would see rows of data simply containing numbers.
Because column widths are necessary for importing .fwf
files, we must import those first into our session. We have therefore made a helper .csv
file, titled ED_metadata.csv
, that contains the width, name, and variable type of each column. Our data only has 579 columns, so making such a file only took a couple of hours. If you have a bigger dataset, you may have to rely on automated width detection methods and/or more team members to do the grunt work of creating a schema for your data.
Loading the metadata
With our first cell, let's import the metadata and print a small preview of it:
import pandas as...