





















































Data visualization is one of the most important tasks in data science track. Through effective visualization we can easily uncover underlying pattern among variables with doing any sophisticated statistical analysis. In this cookbook we have focused on graphical analysis using R in a very simple way with each independent example. We have covered default R functionality along with more advance visualization techniques such as lattice, ggplot2, and three-dimensional plots. Readers will not only learn the code to produce the graph but also learn why certain code has been written with specific examples.
R Graphs Cookbook Second Edition written by Jaynal Abedin and Hrishi V. Mittal is such a book where the user will learn how to produce various graphs using R and how to customize them and finally how to make ready for publication. This practical recipe book starts with very brief description about R graphics system and then gradually goes through basic to advance plots with examples. Beside the R default graphics this recipe book introduces advance graphic system such as lattice and ggplot2; the grammar of graphics. We have also provided examples on how to inspect large dataset using advanced visualization such as tableplot and three dimensional visualizations. We also cover the following topics:
(For more resources related to this topic, see here.)
This recipe book is targeted to those reader groups who already exposed to R programming and want to learn effective graphics with the power of R and its various libraries.
This hands-on guide starts with very short introduction to R graphics system and then gets straight to the point – actually creating graphs, instead of just theoretical learning. Each recipe is specifically tailored to full fill reader’s appetite for visually representing the data in the best way possible.
Now, we will present few examples so that you can have an idea about the content of this recipe book:
The ggplot2 R package is based on The Grammar of Graphics by Leland Wilkinson, Springer). Using this package, we can produce a variety of traditional graphics, and the user can produce their customized graphs as well. The beauty of this package is in its layered graphics facilities; through the use of layered graphics utilities, we can produce almost any kind of data visualization. Recently, ggplot2 is the most searched keyword in the R community, including the most popular R blog (www.r-bloggers.com). The comprehensive theme system allows the user to produce publication quality graphs with a variety of themes of choice. If we want to explain this package in a single sentence, then we can say that if whatever we can think about data visualization can be structured in a data frame, the visualization is a matter of few seconds.
In the specific chapter on ggplot2 , we will see different examples and use themes to produce publication quality graphs. However, in this introductory chapter, we will show you one of the important features of the ggplot2 package that produces various types of graphs. The main function is ggplot(), but with the help of a different geom function, we can easily produce different types of graphs, such as the following:
Now, we will see a simple example of the use of different geom functions with the default R mtcars dataset:
# loading ggplot2 library library(ggplot2) # creating a basic ggplot object p <- ggplot(data=mtcars) # Creating scatter plot of mpg and disp variable p1 <- p+geom_point(aes(x=disp,y=mpg)) # creating line chart from the same ggplot object but different # geom function p2 <- p+geom_line(aes(x=disp,y=mpg)) # creating bar chart of mpg variable p3 <- p+geom_bar(aes(x=mpg)) # creating boxplot of mpg over gear p4 <- p+geom_boxplot(aes(x=factor(gear),y=mpg)) # writing certain text into the scatter plot p5 <- p1+geom_text(x=200,y=25,label="Scatter plot")
The visualization of the preceding five plot will look like the following figure:
The empirical Cumulative Distribution function (CDF) is the non-parametric maximum-likelihood estimation of the CDF. In this recipe, we will see how the empirical CDF can be produced.
To produce this plot, we need to use the latticeExtra library. We will use the simulated dataset as shown in the following code:
# Set a seed value to make the data reproducible set.seed(12345) qqdata <-data.frame(disA=rnorm(n=100,mean=20,sd=3), disB=rnorm(n=100,mean=25,sd=4), disC=rnorm(n=100,mean=15,sd=1.5), age=sample((c(1,2,3,4)),size=100,replace=T), sex=sample(c("Male","Female"),size=100,replace=T), econ_status=sample(c("Poor","Middle","Rich"), size=100,replace=T))
To plot an empirical CDF, we first need to call the latticeExtra library (note that this library has a dependency on RColorBrewer). Now, to plot the empirical CDF, we can use the following simple code:
library(latticeExtra) ecdfplot(~disA|sex,data=qqdata)
To produce publication-quality data visualization, we often need to annotate the graph with various texts, symbols, or even shapes. In this recipe, we will see how we can easily annotate an existing graph.
In this recipe, we will use the disA and disD variables from ggplotdata. Let's call ggplotdata for this recipe. We also need to call the grid and gridExtra libraries for this recipe.
In this recipe, we will execute the following annotation on an existing scatter plot. So, the whole procedure will be as follows:
Now, we will implement each of the steps one by one:
library(grid) library(gridExtra) # creating scatter plot and print it annotation_obj <- ggplot(data=ggplotdata,aes(x=disA,y=disD))+geom_point() annotation_obj # Adding custom text at (18,29) position annotation_obj1 <- annotation_obj + annotate(geom="text",x=18,y=29,label="Extreme value",size=3) annotation_obj1 # Highlight certain regions with a box annotation_obj2 <- annotation_obj1+ annotate("rect", xmin = 24, xmax = 27,ymin=17,ymax=22,alpha = .2) annotation_obj2 # Drawing line segment with arrow annotation_obj3 <- annotation_obj2+ annotate("segment",x = 16,xend=17.5,y=25,yend=27.5,colour="red", arrow = arrow(length = unit(0.5, "cm")),size=2) annotation_obj3
The preceding four steps are displayed in the following single graph:
The annotate() function takes input of a geom such as “segment”, “text” etc, and then it takes another input regarding position of that geom that is where to draw or where to place.. In this particular recipe, we used three geom instances, such as text to write customized text within the plot, rect to highlight a certain region in the plot, and segment to draw an arrow. The alpha argument represents the transparency of the region and size argument to represent the size of the text and line width of the line segment.
This article just gives a sample recipe of what kind of recipes are included in the book, and how the structure of each recipe is.
Further resources on this subject: