Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Data visualization

Save for later
  • 480 min read
  • 2014-10-27 00:00:00

article-image

Data visualization is one of the most important tasks in data science track. Through effective visualization we can easily uncover underlying pattern among variables with doing any sophisticated statistical analysis. In this cookbook we have focused on graphical analysis using R in a very simple way with each independent example. We have covered default R functionality along with more advance visualization techniques such as lattice, ggplot2, and three-dimensional plots. Readers will not only learn the code to produce the graph but also learn why certain code has been written with specific examples.

R Graphs Cookbook Second Edition written by Jaynal Abedin and Hrishi V. Mittal is such a book where the user will learn how to produce various graphs using R and how to customize them and finally how to make ready for publication. This practical recipe book starts with very brief description about R graphics system and then gradually goes through basic to advance plots with examples. Beside the R default graphics this recipe book introduces advance graphic system such as lattice and ggplot2; the grammar of graphics. We have also provided examples on how to inspect large dataset using advanced visualization such as tableplot and three dimensional visualizations. We also cover the following topics:

  • How to create various types of bar charts using default R functions, lattice and ggplot2
  • How to produce density plots along with histograms using lattice and ggplot2 and customized them for publication
  • How to produce graphs of frequency tabulated data
  • How to inspect large dataset by simultaneously visualizing numeric and categorical variables in a single plot
  • How to annotate graphs using ggplot2

(For more resources related to this topic, see here.)

This recipe book is targeted to those reader groups who already exposed to R programming and want to learn effective graphics with the power of R and its various libraries.

This hands-on guide starts with very short introduction to R graphics system and then gets straight to the point – actually creating graphs, instead of just theoretical learning. Each recipe is specifically tailored to full fill reader’s appetite for visually representing the data in the best way possible.

Now, we will present few examples so that you can have an idea about the content of this recipe book:

The ggplot2 R package is based on The Grammar of Graphics by Leland Wilkinson, Springer). Using this package, we can produce a variety of traditional graphics, and the user can produce their customized graphs as well. The beauty of this package is in its layered graphics facilities; through the use of layered graphics utilities, we can produce almost any kind of data visualization. Recently, ggplot2 is the most searched keyword in the R community, including the most popular R blog (www.r-bloggers.com). The comprehensive theme system allows the user to produce publication quality graphs with a variety of themes of choice. If we want to explain this package in a single sentence, then we can say that if whatever we can think about data visualization can be structured in a data frame, the visualization is a matter of few seconds.

In the specific chapter on ggplot2 , we will see different examples and use themes to produce publication quality graphs. However, in this introductory chapter, we will show you one of the important features of the ggplot2 package that produces various types of graphs. The main function is ggplot(), but with the help of a different geom function, we can easily produce different types of graphs, such as the following:

  • geom_point(): This will create scatter plot
  • geom_line(): This will create a line chart
  • geom_bar(): This will create a bar chart
  • geom_boxplot(): This will create a box plot
  • geom_text(): This will write certain text inside the plot area

Now, we will see a simple example of the use of different geom functions with the default R mtcars dataset:

# loading ggplot2 library
library(ggplot2)
# creating a basic ggplot object
p <- ggplot(data=mtcars)
# Creating scatter plot of mpg and disp variable
p1 <- p+geom_point(aes(x=disp,y=mpg))
# creating line chart from the same ggplot object but different
# geom function
p2 <- p+geom_line(aes(x=disp,y=mpg))
# creating bar chart of mpg variable
p3 <- p+geom_bar(aes(x=mpg))
# creating boxplot of mpg over gear
p4 <- p+geom_boxplot(aes(x=factor(gear),y=mpg))
# writing certain text into the scatter plot
p5 <- p1+geom_text(x=200,y=25,label="Scatter plot")

The visualization of the preceding five plot will look like the following figure:

data-visualization-img-0

Visualizing an empirical Cumulative Distribution function

The empirical Cumulative Distribution function (CDF) is the non-parametric maximum-likelihood estimation of the CDF. In this recipe, we will see how the empirical CDF can be produced.

Getting ready

To produce this plot, we need to use the latticeExtra library. We will use the simulated dataset as shown in the following code:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €14.99/month. Cancel anytime
# Set a seed value to make the data reproducible
set.seed(12345)
qqdata <-data.frame(disA=rnorm(n=100,mean=20,sd=3),
               disB=rnorm(n=100,mean=25,sd=4),
               disC=rnorm(n=100,mean=15,sd=1.5),
               age=sample((c(1,2,3,4)),size=100,replace=T),
               sex=sample(c("Male","Female"),size=100,replace=T),
                econ_status=sample(c("Poor","Middle","Rich"),
               size=100,replace=T))

How to do it…

To plot an empirical CDF, we first need to call the latticeExtra library (note that this library has a dependency on RColorBrewer). Now, to plot the empirical CDF, we can use the following simple code:

library(latticeExtra)
ecdfplot(~disA|sex,data=qqdata)

data-visualization-img-1

Graph annotation with ggplot

To produce publication-quality data visualization, we often need to annotate the graph with various texts, symbols, or even shapes. In this recipe, we will see how we can easily annotate an existing graph.

Getting ready

In this recipe, we will use the disA and disD variables from ggplotdata. Let's call ggplotdata for this recipe. We also need to call the grid and gridExtra libraries for this recipe.

How to do it...

In this recipe, we will execute the following annotation on an existing scatter plot. So, the whole procedure will be as follows:

  1. Create a scatter plot
  2. Add customized text within the plot
  3. Highlight certain region to indicate extreme values
  4. Draw a line segment with an arrow within the scatter plot to indicate a single extreme observation

Now, we will implement each of the steps one by one:

library(grid)
library(gridExtra)
# creating scatter plot and print it
annotation_obj <- ggplot(data=ggplotdata,aes(x=disA,y=disD))+geom_point()
annotation_obj
# Adding custom text at (18,29) position
annotation_obj1 <- annotation_obj + annotate(geom="text",x=18,y=29,label="Extreme value",size=3)
annotation_obj1
# Highlight certain regions with a box
annotation_obj2 <- annotation_obj1+
annotate("rect", xmin = 24, xmax = 27,ymin=17,ymax=22,alpha = .2)
annotation_obj2
# Drawing line segment with arrow
annotation_obj3 <- annotation_obj2+
annotate("segment",x = 16,xend=17.5,y=25,yend=27.5,colour="red", arrow = arrow(length = unit(0.5, "cm")),size=2)
annotation_obj3

The preceding four steps are displayed in the following single graph:

data-visualization-img-2

How it works...

The annotate() function takes input of a geom such as “segment”, “text” etc, and then it takes another input regarding position of that geom that is where to draw or where to place.. In this particular recipe, we used three geom instances, such as text to write customized text within the plot, rect to highlight a certain region in the plot, and segment to draw an arrow. The alpha argument represents the transparency of the region and size argument to represent the size of the text and line width of the line segment.

Summary

This article just gives a sample recipe of what kind of recipes are included in the book, and how the structure of each recipe is.

Resources for Article:


Further resources on this subject: