Chapter 5. Retrieving, Processing, and Storing Data
Data can be found everywhere in all shapes and forms. We can get it from the Web, by e-mail and FTP, or create it ourselves in a lab experiment or marketing poll. An exhaustive overview of how to acquire data in various formats will require many more pages than what we have available. Sometimes, we need to store data before we can analyze it or after we are done with our analysis. We will also discuss storing data in this chapter. Chapter 8, Working with Databases, gives information about various databases (relational and NoSQL) and related APIs. The following is a list of the topics that we are going to cover in this chapter:
Writing CSV files with NumPy and pandas
The binary
.npy
and pickle formatsReading and writing to Excel with pandas
JSON
REST web services
Parsing RSS feeds
Scraping the Web
Parsing HTML
Storing data with PyTables
HDF5 pandas I/O