Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Learning pandas

You're reading from   Learning pandas High performance data manipulation and analysis using Python

Arrow left icon
Product type Paperback
Published in Jun 2017
Publisher
ISBN-13 9781787123137
Length 446 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Michael Heydt Michael Heydt
Author Profile Icon Michael Heydt
Michael Heydt
Arrow right icon
View More author details
Toc

Table of Contents (22) Chapters Close

Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
1. pandas and Data Analysis FREE CHAPTER 2. Up and Running with pandas 3. Representing Univariate Data with the Series 4. Representing Tabular and Multivariate Data with the DataFrame 5. Manipulating DataFrame Structure 6. Indexing Data 7. Categorical Data 8. Numerical and Statistical Methods 9. Accessing Data 10. Tidying Up Your Data 11. Combining, Relating, and Reshaping Data 12. Data Aggregation 13. Time-Series Modelling 14. Visualization 15. Historical Stock Price Analysis

Preface

Pandas is a popular Python package used for practical, real-world data analysis. It provides efficient, fast, and high-performance data structures that make data exploration and analysis very easy. This learner's guide will help you through a comprehensive set of features provided by the pandas library to perform efficient data manipulation and analysis.

What this book covers

Chapter 1 , pandas and Data Analysis, is a hands-on introduction to the key features of pandas. The idea of this chapter is to provide some context for using pandas in the context of statistics and data science. The chapter will get into several concepts in data science and show how they are supported by pandas. This will set a context for each of the subsequent chapters, mentioning each chapter relates to both data science and data science processes.

Chapter 2, Up and Running with pandas, instructs the reader on obtain and install pandas, and to get introduce a few of the basic concepts in pandas. We will also look at how the examples are presented using iPython and Juypter notebook.

Chapter 3, Representing Univariate Data with the Series, walks the reader through the use of the pandas Series, which provides 1-dimensional, indexed data representations. The reader will learn about how to create Series objects and how to manipulate data held within. They will also learn about indexes and alignment of data, and about how the Series can be used to slice data.

Chapter 4, Representing Tabular and Multivariate Data with the DataFrame, walks the reader through the basic use of the pandas DataFrame, which provides and indexes multivariate data representations. This chapter will instruct the reader to be able to create DataFrame objects using various sets of static data, and how to perform selection of specific columns and rows within. Complex queries, manipulation, and indexing will be now handled in the following chapter.

Chapter 5, Manipulation and Indexing of DataFrame objects, expands on the previous chapter and instructs you on how to perform more complex manipulations of a DataFrame. We start by learning how to add, remove, and delete columns and rows; modify data within a DataFrame (or created a modified copy); perform calculations on data within; create hierarchical indexes; and also calculate common statistical results upon DataFrame contents.

Chapter 6, Indexing Data, shows how data can be loaded and saved from external sources into both Series and DataFrame objects. The chapter also covers data access from multiple sources such as files, http servers, database systems, and web services. Also covered is the processing of data in CSV, HTML, and JSON formats.

Chapter 7, Categorical Data, instructs the reader on how to use the various tools provided by pandas for managing dirty and missing data.

Chapter 8, Numerical and Statistical Methods, covers various techniques for combining, splitting, joining, and merging of data located in multiple pandas objects, and then demonstrates on how to reshape data using concepts such as pivots, stacking, and melting.

Chapter 9, Accessing Data, talks about grouping and performing aggregate data analysis. In pandas, this is often referred to as the split-apply-combine pattern. The reader will learn about using this pattern to group data in various different configurations and also apply aggregate functions to calculate results upon each group of data.

Chapter 10, Tidying Up Your Data, explains how to organize data in a tidy form, that is usable for data analysis.

Chapter 11, Combining, Relating and Reshaping Data, tells the readers how they can take data in multiple pandas objects and combine them, through concepts such as joins, merges and concatenation.

Chapter 12, Data Aggregation, dives into the integration of pandas with matplotlib to visualize pandas data. The chapter will demonstrate how to present many common statistical and financial data visualizations including bar charts, histograms, scatter plots, area plots, density plots, and heat maps.

Chapter 13, Time-Series Modeling, covers representing time series data in pandas. This chapter will cover the extensive capabilities provided by pandas for facilitating analysis of time series data.

Chapter 14, Visualization, teaches you how to create data visualizations based upon data stored in pandas data structures. We start with the basics learning, how to create a simple chart from data and control several of the attributes of the chart (such as legends, labels, and colors). We examine the creation of several common types of plot used to represent different types of data that are use those plot types to convey meaning in the underlying data. We also learn how to integrate pandas with D3.js so that we can create rich web-based visualizations.

Chapter 15, Historical Stock Price Analysis, shows you how to apply pandas to basic financial problems. It will focus on data obtained from Yahoo! Finance, and will demonstrate a number of financial concepts in financial data such as calculating returns, moving averages, volatility, and several other concepts. The student will also learns how to apply data visualization to these financial concepts.

What you need for this book

This book assumes some familiarity with programming concepts, but those without programming experience, or specifically Python programming experience, will be comfortable with the examples as they focus on pandas constructs more than Python or programming. The examples are based on Anaconda Python 2.7 and pandas 0.15.1. If you do not have either installed, guidance will be given in Chapter 2, Up and Running with pandas, regarding installing pandas on installing both on Windows, OSX, and Ubuntu systems. For those not interested in installing any software, instruction is also given on using the Warkari.io online Python data analysis service.

Who this book is for

This book is ideal for data scientists, data analysts, and Python programmers who want to plunge into data analysis using pandas, and anyone curious about analyzing data. Some knowledge of statistics and programming will help you to get the most out of this book but that's not strictly required. Prior exposure to pandas is also not required.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "This information can be easily imported into DataFrame using the pd.read_csv() function as follows."

A block of code entered in a Python interpreter is set as follows:

import pandas as pd
df = pd.DataFrame.from_items([('column1', [1, 2, 3])])
print (df)

Any command-line input or output is written as follows:

mh@ubuntu:~/Downloads$ chmod +x Anaconda-2.1.0-Linux-x86_64.sh
mh@ubuntu:~/Downloads$ ./Anaconda-2.1.0-Linux-x86_64.sh

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "The shortcuts in this book are based on the Mac OS X 10.5+ scheme."

Note

Warnings or important notes appear in a box like this.

Note

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You can download the code files by following these steps:

  1. Log in or register to our website using your e-mail address and password.
  2. Hover the mouse pointer on the SUPPORT tab at the top.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box.
  5. Select the book for which you're looking to download the code files.
  6. Choose from the drop-down menu where you purchased this book from.
  7. Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Pandas-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the codewe would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www. packtpub. com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www. packtpub. com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images