Data wrangling
If you have some experience working on data of some sort, you will recollect that most of the time data needs to be preprocessed so that we can further use it as part of a bigger analysis. This process is called data wrangling.
Let's see what the typical flow in this process looks like:
- Data acquisition
- Data structure analysis
- Information extraction
- Unwanted data removal
- Data transformation
- Data standardization
Let's try to understand these in detail.
Data acquisition
Even though not a part of data wrangling, this phase deals with the process of acquiring data from somewhere. Typically, all data is generated and stored in a central location or is available in files located on some shared storage.
Having an understanding of this step helps us to build an interface or use existing libraries to pull data from the acquired data source location.
Data structure analysis
Once data is acquired, we have to understand the structure of the data. Remember that the data we are getting can be in any...