Packt+ | Advance your knowledge in tech

You're reading from Practical Data Science Cookbook, Second Edition Data pre-processing, analysis and visualization using R and Python

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781787129627

Length 434 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Analysis

Authors (5):

Tattar

Bhushan Purushottam Joshi

Sean P Murphy

ABHIJIT DASGUPTA

Anthony Ojeda

+1 more

View More author details

Table of Contents (17) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

1. Preparing Your Data Science Environment

2. Driving Visual Analysis with Automobile Data with R FREE CHAPTER

3. Creating Application-Oriented Analyses Using Tax Data and Python

4. Modeling Stock Market Data

5. Visually Exploring Employment Data

6. Driving Visual Analyses with Automobile Data

7. Working with Social Graphs

8. Recommending Movies at Scale (Python)

9. Harvesting and Geolocating Twitter Data (Python)

10. Forecasting New Zealand Overseas Visitors

11. German Credit Data Analysis

Repeating the analysis in R

This brief survey session is intended to replicate most of the data analysis discussed in the preceding section using the R software. The section is self-contained in the sense that there is no dependency on any R package.

Getting ready

The functions available in the R default version suffice to perform the analysis done earlier in the chapter. The income_dist.csv file needs to be present in the current working directory.

How to do it...

A step-by-step approach to perform the analysis related to the income_dist.csv file can be easily carried out as shown in the next program.

Load the dataset income_dist.csv using the read.csv function and use the functions nrow, str, length, unique, and so on to get the following results:

id <- read.csv("income_dist.csv",header=TRUE) 
nrow(id) 
str(names(id)) 
length(names(id))  
ncol(id) # equivalent of previous line 
unique(id$Country) 
levels(id$Country) # alternatively 
min(id$Year) 
max(id$Year) 
id_us <- id[id$Country=...