In this article by Atmajitsinh Gohil, author of the book R Data Visualization Cookbook, we will cover the following topics:

A simple bar plot
A simple line plot
Line plot to tell an effective story
Merging histograms
Making an interactive bubble plot

(For more resources related to this topic, see here.)

The main motivation behind this article is to introduce the basics of plotting in R and an element of interactivity via the googleVis package. The basic plots are important as many packages developed in R use basic plot arguments and hence understanding them creates a good foundation for new R users. We will start by exploring the scatter plots in R, which are the most basic plots for exploratory data analysis, and then delve into interactive plots. Every section will start with an introduction to basic R plots and we will build interactive plots thereafter. We will utilize the power of R analytics and implement them using the googleVis package to introduce the element of interactivity.

The googleVis package is developed by Google and it uses the Google Chart API to create interactive plots. There are a range of plots available with the googleVis package and this provides us with an advantage to plot the same data on various plots and select the one that delivers an effective message. The package undergoes regular updates and releases, and new charts are implemented with every release.

The readers should note that there are other alternatives available to create interactive plots in R, but it is not possible to explore all of them and hence I have selected googleVis to display interactive elements in a chart. I have selected these purely based on my experience with interactivity in plots. The other good interactive package is offered by GGobi.

A simple bar plot

A bar plot can often be confused with histograms. Histograms are used to study the distribution of data whereas bar plots are used to study categorical data. Both the plots may look similar to the naked eye but the main difference is that the width of a bar plot is not of significance, whereas in histograms the width of the bars signifies the frequency of data.

In this recipe, I have made use of the infant mortality rate in India. The data is made available by the Government of India. The main objective is to study the basics of a bar plot in R as shown in the following screenshot:

basic-and-interactive-plots-img-0

How to do it…

We start the recipe by importing our data in R using the read.csv() function. R will search for the data under the current directory, and hence we use the setwd() function to set our working directory:

setwd("D:/book/scatter_Area/chapter2")
data = read.csv("infant.csv", header = TRUE)

Once we import the data, we would like to process the data by ordering it. We order the data using the order() function in R. We would like R to order the column Total2011 in a decreasing order:

data = data[order(data$Total2011, decreasing = TRUE),]

We use the ifelse() function to create a new column. We would utilize this new column to add different colors to bars in our plot. We could also write a loop in R to do this task but we will keep this for later. The ifelse() function is quick and easy. We instruct R to assign yes if values in the column Total2011 are more than 12.2 and no otherwise. The 12.2 value is not randomly chosen but is the average infant mortality rate of India:

new = ifelse(data$Total2011>12.2,"yes","no")

Next, we would like to join the vector of yes and no to our original dataset. In R, we can join columns using the cbind() function. Rows can be combined using rbind():

data = cbind(data,new)

When we initially plot the bar plot, we observe that we need more space at the bottom of the plot. We adjust the margins of a plot in R by passing the mar() argument within the par() function. The mar() function uses four arguments: bottom, left, top, and right spacing:

par(mar = c(10,5,5,5))

Next, we generate a bar plot in R using the barplot() function. The abline() function is used to add a horizontal line on the bar plot:

barplot(data$Total2011, las = 2, names.arg= data$India,width =
0.80, border = NA,ylim=c(0,20), col = "#e34a33", main = "Infant
Mortality Rate of India in 2011")
abline(h = 12.2, lwd =2, col = "white", lty =2)

How it works…

The order() function uses permutation to rearrange (decreasing or increasing) the rows based on the variable. We would like to plot the bars from highest to lowest, and hence we require to arrange the data. The ifelse() function is used to generate a new column. We would use this column under the There's more… section of this recipe. The first argument under the ifelse() function is the logical test to be performed. The second argument is the value to be assigned if the test is true, and the third argument is the value to be assigned if the logical test fails.

The first argument in the barplot() function defines the height of the bars and horiz = TRUE (not used in our code) instructs R to plot the bars horizontally. The default setting in R will plot the bars vertically. The names.arg argument is used to label the bars. We also specify border = NA to remove the borders and las = 2 is specified to apply the direction to our labels. Try replacing the las values with 1,2,3, or 4 and observe how the orientation of our labels change..

The first argument in the abline() function assigns the position where the line is drawn, that is, vertical or horizontal. The lwd, lty, and col arguments are used to define the width, line type, and color of the line.

There's more…

While plotting a bar plot, it's a good practice to order the data in ascending or descending order. An unordered bar plot does not convey the right message and the plot is hard to read when there are more bars involved. When we observe a plot, we are interested to get the most information out, and ordering the data is the first step toward achieving this objective.

We have not specified how we can use the ifelse() and cbind() functions in the plot. If we would like to color the plot with different colors to let the readers know which states have high infant mortality above the country level, we can do this by pasting col = (data$new) in place of col = "#e34a33".

A simple line plot

Line plots are simply lines connecting all the x and y dots. They are very easy to interpret and are widely used to display an upward or downward trend in data. In this recipe, we will use the googleVis package and create an interactive R line plot. We will learn how we can emphasize on certain variables in our data. The following line plot shows fertility rate:

basic-and-interactive-plots-img-1

Getting ready

We will use the googleVis package to generate a line plot.

How to do it…

In order to construct a line chart, we will install and load the googleVis package in R. We would also import the fertility data using the read.csv() function:

install.packages("googleVis")
library(googleVis)
frt = read.csv("fertility.csv", header = TRUE, sep =",")

The fertility data is downloaded from the OECD website. We can construct our line object using the gvisLineChart() function:

gvisLineChart(frt, xvar = "Year",
"yvar=c("Australia","Austria","Belgium","Canada","Chile","OECD34"),
options = list( width = 1100, height= 500, backgroundColor = 
" "#FFFF99",title ="Fertility Rate in OECD countries" ,
vAxis = "{title : 'Total Fertility " Rate',gridlines:
{color:'#DEDECE',count : 4}, ticks : "   [0,1,2,3,4]}",
series = "{0:{color:'black', visibleInLegend :false},
       1:{color:'BDBD9D', visibleInLegend :false},
       2:{color:'BDBD9D', visibleInLegend :false},
           3:{color:'BDBD9D', visibleInLegend :false},
          4:{color:'BDBD9D', visibleInLegend :false},
         34:{color:'3333FF', visibleInLegend :true}}"))

We can construct the visualization using the plot() function in R:

plot(line)

How it works…

The first three arguments of the gvisLineChart() function are the data and the name of the columns to be plotted on the x-axis and y-axis. The options argument lists the chart API options to add and modify elements of a chart.

For the purpose of this recipe, we will use part of the dataset. Hence, while we assign the series to be plotted under yvar = c(), we will specify the column names that we would like to be plotted in our chart. Note that the series starts at 0, and hence Australia, which is the first column, is in fact series 0 and not 1.

For the purpose of this exercise, let's assume that we would like to demonstrate the mean fertility rate among all OECD economies to our audience. We can achieve this using series {} under option = list(). The series argument will allow us to specify or customize a specific series in our dataset. Under the gvisLineChart() function, we instruct the Google Chart API to color OECD series (series 34) and Australia (series 0) with a different color and also make the legend visible only for OECD and not the entire series.

It would be best to display all the legends but we use this to show the flexibility that comes with the Google Chart API. Finally, we can use the plot() function to plot the chart in a browser. The following screenshot displays a part of the data. The dim() function gives us a general idea about the dimensions of the fertility data:

basic-and-interactive-plots-img-2

New York Times Visualization often combines line plots with bar chart and pie charts. Readers should try constructing such visualization. We can use the gvisMerge() function to merge plots. The function allows merging of just two plots and hence the readers would have to use multiple gvisMerge() functions to create a very similar visualization. The same can also be constructed in R but we will lose the interactive element.

Line plot to tell an effective story

In the previous recipe, we learned how to plot a very basic line plot and use some of the options. In this recipe, we will go a step further and make use of specific visual cues such as color and line width for easy interpretation.

Line charts are a great tool to visualize time series data. The fertility data is discrete but connecting points over time provides our audience with a direction. The visualization shows the amazing progress countries such as Mexico and Turkey have achieved in reducing their fertility rate.

OECD defines fertility rate as Refers to the number of children that would be born per woman, assuming no female mortality at child-bearing ages and the age-specific fertility rates of a specified country and reference period.

Line plots have been widely used by New York Times to create very interesting infographics. This recipe is inspired by one of the New York Times visualizations. It is very important to understand that many of the infographics created by professionals are created using D3.js or Processing. We will not go into the detail of the same but it is good to know the working of these softwares and how they can be used to create visualizations.

basic-and-interactive-plots-img-3

Getting ready

We would need to install and load the googleVis package to construct a line chart.

How to do it…

To generate an interactive plot, we will load the fertility data in R using the read.csv() function. To generate a line chart that plots the entire dataset, we will use the gvisLineChart() function:

line = gvisLineChart(frt, xvar = "Year", yvar=c("Australia",
""Austria","Belgium","Canada","Chile","Czech.Republic",
"Denmark","Estonia","Finland","France","Germany","Greece","Hungary"",
"Iceland","Ireland","Israel","Italy","Japan","Korea","Luxembourg",""Mexico",
"Netherlands","New.Zealand","Norway","Poland","Portugal","Slovakia"","Slovenia",
"Spain","Sweden","Switzerland","Turkey","United.Kingdom","United."States","OECD34"),
options = list( width = 1200, backgroundColor = 
"#ADAD85",title " ="Fertility Rate in OECD countries" ,
vAxis = "{gridlines:{color:'#DEDECE',count : 3}, ticks : " [0,1,2,3,4]}",
series = "{0:{color:'BDBD9D', visibleInLegend :false},
20:{color:'009933', visibleInLegend :true},
31:{color:'996600', visibleInLegend :true},
34:{color:'3333FF', visibleInLegend :true}}"))

To display our visualization in a new browser, we use the generic R plot() function:

plot(line)

How it works…

The arguments passed in the gvisLineChart() function, are exactly the same as discussed under the simple line plot with some minor changes. We would like to plot the entire data for this exercise, and hence we have to state all the column names in yvar =c().

Also, we would like to color all the series with the same color but highlight Mexico, Turkey, and OECD average. We have achieved this in the previous code using series {}, and further specify and customize colors and legend visibility for specific countries.

In this particular plot, we have made use of the same color for all the economies but have highlighted Mexico and Turkey to signify the development and growth that took place in the 5-year period. It would also be effective if our audience could compare the OECD average with Mexico and Turkey. This provides the audience with a benchmark they can compare with.

If we plot all the legends, it may make the plot too crowded and 34 legends may not make a very attractive plot. We could avoid this by only making specific legends visible.

Merging histograms

Histograms help in studying the underlying distribution. It is more useful when we are trying to compare more than one histogram on the same plot; this provides us with greater insight into the skewness and the overall distribution.

In this recipe, we will study how to plot a histogram using the googleVis package and how we merge more than one histogram on the same page. We will only merge two plots but we can merge more plots and try to adjust the width of each plot. This makes it easier to compare all the plots on the same page. The following plot shows two merged histograms:

basic-and-interactive-plots-img-4

How to do it…

In order to generate a histogram, we will install the googleVis package as well as load the same in R:

install.packages("googleVis")
library(googleVis)

We have downloaded the prices of two different stocks and have calculated their daily returns over the entire period. We can load the data in R using the read.csv() function. Our main aim in this recipe is to plot two different histograms and plot them side by side in a browser. Hence, we require to divide our data in three different data frames. For the purpose of this recipe, we will plot the aapl and msft data frames:

stk = read.csv("stock_cor.csv", header = TRUE, sep = ",")
aapl = data.frame(stk$AAPL)
msft = data.frame(stk$MSFT)
googl = data.frame(stk$GOOGL)

To generate the histograms, we implement the gvisHistogram() function:

al = gvisHistogram(aapl, options = list(histogram = 
"{bucketSize " :1}",legend = "none",title ='Distribution of AAPL Returns', "   
width = 500,hAxis = "{showTextEvery: 5,title: "     'Returns'}",
vAxis = "{gridlines : {count:4}, title : "       'Frequency'}"))
mft = gvisHistogram(msft, options = list(histogram = "{bucketSize " :1}",
legend = "none",title ='Distribution of MSFT Returns', "   
width = 500,hAxis = "{showTextEvery: 5,title: 'Returns'}","     
vAxis = "{gridlines : {count:4}, title : 'Frequency'}"))

We combine the two gvis objects in one browser using the gvisMerge() function:

mrg = gvisMerge(al,mft, horizontal = TRUE)
plot(mrg)

How it works…

The data.frame() function is used to construct a data frame in R. We require this step as we do not want to plot all the three histograms on the same plot. Note the use of the $ notation in the data.frame() function.

The first argument in the gvisHistogram() function is our data stored as a data frame. We can display individual histograms using the plot(al) and plot(mft) functions. But in this recipe, we will plot the final output.

We observe that most of the attributes of a histogram function are the same as discussed in previous recipes. The histogram functionality will use an algorithm to create buckets, but we can control this using the bucketSize as histogram = "{bucketSize :1}".

Try using different bucket sizes and observe how the buckets in the histograms change. More options related to histograms can also be found in the following link under the Controlling Buckets section:

https://developers.google.com/chart/interactive/docs/gallery/histogram#Buckets

We have utilized showTextEvery, which is also very specific to histograms. This option allows us to specify how many horizontal axis labels we would like to show. We have used 5 to make the histogram more compact. Our main objective is to observe the distribution and the plot serves our purpose. Finally, we will implement plot() to plot the chart in our favorite browser.

We do the same steps to plot the return distribution of Microsoft (MSFT). Now, we would like to place both the plots side by side and view the differences in the distribution. We will use the gvisMerge() function to generate histograms side by side.

In our recipe, we have two plots for AAPL and MSFT. The default setting plots each chart vertically but we can specify horizontal = true to plot charts horizontally.

Making an interactive bubble plot

My first encounter with a bubble plot was while watching a TED video of Hans Roslling. The video led me to search for creating bubble plots in R; a very good introduction to this is available on the Flowing Data website. The advantage of a bubble plot is that it allows us to visualize a third variable, which in our case would be the size of the bubble.

In this recipe, I have made use of the googleVis package to plot a bubble plot but you can also implement this in R. The advantage of the Google Chart API is the interactivity and the ease with which they can be attached to a web page. Also note that we could also use squares instead of circles, but this is not implemented in the Google Chart API yet.

In order to implement a bubble plot, I have downloaded the crime dataset by state. The details regarding the link and definition of crime data are available in the crime.txt file and are shown in the following screenshot:

basic-and-interactive-plots-img-5

How to do it…

As with all the plots in this article, we will install and load the googleVis Package. We will also import our data file in R using the read.csv() function:

crm = read.csv("crimeusa.csv", header = TRUE, sep =",")

We can construct our bubble chart using the gvisBubbleChart() function in R:

bub1 = gvisBubbleChart(crm,idvar = "States",xvar= "Robbery", yvar=
"Burglary", sizevar ="Population", colorvar = "Year",
options = list(legend = "none",width = 900, height = 600,title
=" Crime per State in 2012", sizeAxis ="{maxSize : 40, minSize
:0.5}",vAxis = "{title : 'Burglary'}",hAxis= "{title :
'Robbery'}"))
bub2 = gvisBubbleChart(crm,idvar = "States",xvar= "Robbery", yvar=
"Burglary",sizevar ="Population",
options = list(legend = "none",width = 900, height = 600,title
=" Crime per State in 2012", sizeAxis ="{maxSize : 40, minSize
:0.5}",vAxis = "{title : 'Burglary'}",hAxis= "{title :
'Robbery'}"))ata

How it works…

The gvisBubbleChart() function uses six attributes to create a bubble chart, which are as follows:

data: This is the data defined as a data frame, in our example, crm
idvar: This is the vector that is used to assign IDs to the bubbles, in our example, states
xvar: This is the column in the data to plot on the x-axis, in our example, Robbery
yvar: This is the column in the data to plot on the y-axis, in our example, Burglary
sizevar: This is the column used to define the size of the bubble
colorvar: This is the column used to define the color

We can define the minimum and maximum sizes of each bubble using minSize and maxSize, respectively, under options(). Note that we have used gvisMerge to portray the differences among the bubble plots. In the plot on the right, we have not made use of colorvar and hence all the bubbles are of the same size.

There's more…

The Google Chart API makes it easier for us to plot a bubble, but the same can be achieved using the R basic plot function. We can make use of the symbols to create a plot. The symbols need not be a bubble; it can be a square as well. By this time, you should have watched Hans' TED lecture and would be wondering how you could create a motion chart with bubbles floating around. The Google Charts API has the ability to create motion charts and the readers can definitely use the googleVis reference manual to learn about this.

Summary

This article introduces some of the basic R plots, such as line and bar charts. It also discusses the basic elements of interactive plots using the googleVis package in R. This article is a great resource for understanding the basic R plotting techniques.