So far we've seen the basics of how D3 works. In this last section of this first chapter, we'll create a simple visualization of some real data. We're going to visualize the popularity of baby names in the USA. The final result will look this:
As you can see in this figure, we create pink bars for the girl names, blue bars for the boy names, and add an axis at the top and the bottom, which shows the number of times that name was chosen. The first thing, though, is take a look at the data.
Sanitizing and getting the data
For this example, we'll download data from https://www.ssa.gov/oact/babynames/limits.html. This site provides data for all the baby names in the US since 1880. On this page, you can find national data and state-specific data. For this example, download the national data dataset. Once you've downloaded it, you can extract it, and you'll see data for a lot of different years:
$ ls -1
NationalReadMe.pdf
yob1880.txt
yob1881.txt
yob1882.txt
yob1883.txt
yob1884.txt
yob1885.txt
...
yob2013.txt
yob2014.txt
yob2015.txt
As you can see, we have data from 1880 until 2015. For this example, I've used the data from 2015, but you can use pretty much anything you want. Now let's look a bit closer at the data:
$ cat yob2015.txt
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286
Isabella,F,15504
Mia,F,14820
Abigail,F,12311
Emily,F,11727
Charlotte,F,11332
Harper,F,10241
...
Zynique,F,5
Zyrielle,F,5
Noah,M,19511
Liam,M,18281
Mason,M,16535
Jacob,M,15816
William,M,15809
Ethan,M,14991
James,M,14705
Alexander,M,14460
Michael,M,14321
Benjamin,M,13608
Elijah,M,13511
Daniel,M,13408
In this data, we've got a large number of rows where each row shows the name and the sex (M
or F
). First, all the girls' names are shown, and after that all the boys' names are shown. The data in itself already looks pretty usable, so we don't need to do much processing before we can use it. The only thing, though, we do is add a header to this file, so that it looks like this:
name,sex,amount
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286
This will make parsing this data into D3 a little bit easier, since the default way of parsing CSV data with D3 assumes the first line is a header. The sanitized data we use in this example can be found here: <DVD3>/src/chapter-01/data/yob2015.txt
.
Creating the visualization
Now that we've got the data we want to work with, we can start creating the example. The files used in this example are the following:
<DVD3>/src/chapter-01/D01-02.html
: The HTML template that loads the correct CSS and JavaScript files for this example<DVD3>/src/chapter-01/js/D01-02.js
: The JavaScript which uses the D3 APIs to draw the chart<DVD3>/src/chapter-01/css/D01-02.css
: Custom CSS to color the bars and format the text elements<DVD3>/src/chapter-01/data/yob2015.txt
: The data that is visualized
Let's start with the complete JavaScript file first. It might seem complex, and it introduces a couple of new concepts, but the general idea should be clear from the code (if you open the source file in your editor, you can also see inline comments for additional explanation):
function show() {
'use strict';
var margin = { top: 30, bottom: 20, right: 40, left: 40 },
width = 800 - margin.left - margin.right,
height = 600 - margin.top - margin.bottom;
var chart = d3.select('.chart')
.attr('width', width + margin.left + margin.right)
.attr('height', height + margin.top + margin.bottom)
.append('g')
.attr('transform', 'translate(' + margin.left + ','
+ margin.top + ')');
var namesToShow = 10;
var barWidth = 20;
var barMargin = 5;
d3.csv('data/yob2015.txt', function (d) { return { name: d.name, sex: d.sex, amount: +d.amount }; }, function (data) {
var grouped = _.groupBy(data, 'sex');
var top10F = grouped['F'].slice(0, namesToShow);
var top10M = grouped['M'].slice(0, namesToShow);
var both = top10F.concat(top10M.reverse());
var bars = chart.selectAll("g").data(both)
.enter()
.append('g')
.attr('transform', function (d, i) {
var yPos = ((barWidth + barMargin) * i);
return 'translate( 0 ' + yPos + ')';
});
var yScale = d3.scaleLinear()
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);
bars.append('rect')
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });
bars.append("text")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });
var bottomAxis = d3.axisBottom().scale(yScale).ticks(20, "s");
var topAxis = d3.axisTop().scale(yScale).ticks(20, "s");
chart.append("g")
.attr('transform', 'translate( 0 ' + both.length * (barWidth + barMargin) + ')')
.call(bottomAxis);
chart.append("g")
.attr('transform', 'translate( 0 ' + -barMargin + ' )')
.call(topAxis);
});
}
In this JavaScript file, we perform the following steps:
- Set up the main
chart
element, like we did in the previous example. - Load the data from the CSV file using
d3.csv
. - Group the loaded data so we only have the top 10 names for both sexes. Note that we use the
groupBy
function from the lodash
library (https://lodash.com/) for this. This library provides a lot of additional functions to deal with common array operations. Throughout this book, we'll use this library in places where the standard JavaScript APIs don't provide enough functionality. - Add
g
elements that will hold the rect
and text
elements for each name. - Create the
rect
elements with the correct width corresponding to the number of times the name was used. - Create the
text
elements to show the name at the end of the rect
elements. - Add some CSS styles for the
rect
and text
elements. - Add an axis to the top and the bottom for easy referencing.
We'll skip the first step since we've already explained that before, and move on to the usage of the d3.csv
API call. Before we do that, there are a couple of variables in the JavaScript that determine how the bars look, and how many we show:
var namesToShow = 10;
var barWidth = 20;
var barMargin = 5;
These variables will be used throughout the explanation in the following sections. What this means is that we're going to show 10 (namesToShow
) names, a bar is 20 (barWidth
) pixels wide, and between each bar we put a five pixel margin.
To load data asynchronously, D3 provides a number of helper functions. In this case, we've used the d3.csv
function:
d3.csv('data/yob2015.txt',
function (d) { return { name: d.name, sex: d.sex, amount: +d.amount }; },
function (data) {
...
}
The d3.csv
function we use takes three parameters. The first one, data/yob2015.txt
, is a URL which points to the data we want to load. The second argument is a function that is applied to each row read by D3. The object that's passed into this function is based on the header row of the CSV file. In our case, this data looks like this:
{
name: 'Sophie',
sex: 'F',
amount: '1234'
}
This (optional) function allows you to modify the data in the row, before it is passed on as an array (data) to the last argument of the d3.csv
function. In this example, we use this second argument to convert the string value d.amount
to a numeric
value. Once the data is loaded and in this case converted, the function provided as the third argument is called with an array of all the read and converted values, ready for us to visualize the data.
D3 provides a number of functions like d3.csv
to load data and resources. These are listed in the following table:
You can also manually process CSV files if they happen to use a different format. You should load those using the d3.text
function, and use any of the functions from the d3-dsv
module to parse the data. You can find more information on the d3-dsv
module here: https://github.com/d3/d3-dsv.
Grouping the loaded data so we only have the top 10 names for both sexes
At this point, we've only loaded the data. If you look back at the figure, you can see that we create a chart using the top 10 female and male names. With the following lines of code, we convert the big incoming data
array to an array that contains just the top 10 female and male names:
var grouped = _.groupBy(data, 'sex');
var top10F = grouped['F'].slice(0, namesToShow);
var top10M = grouped['M'].slice(0, namesToShow);
var both = top10F.concat(top10M.reverse());
Here we use the lodash's groupBy
function,to sort our data based on the sex
property of each row. Next we take the first 10 (namesToShow
) elements from the grouped data, and create a single array from them using the concat
function. We also reverse the top10M
array to make the highest boy's name appear at the bottom of the chart (as you can see when you look at the example).
At this point, we've got the data into a form that we can use. The next step is to create a number of containers, to which we can add the rect
that represents the number of times the name was used, and we'll also add a text
element there that displays the name:
var bars = chart.selectAll("g").data(both)
.enter()
.append('g')
.attr('transform', function (d, i) {
var yPos = ((barWidth + barMargin) * i);
return 'translate( 0 ' + yPos + ')';
});
Here, we bind the both
array to a number of g
elements. We only need to use the enter
function here, since we know that there aren't any g
elements that can be reused. We position each g
element using the translate
operation of the transform
attribute. We translate the g
element along its y-axis based on the barWidth
, the barMargin
, and the position of the data element (d
) in our data (both
) array. If you use the Chrome developer tools, you'll see something like this, which nicely shows the calculated translate
values:
All that is left to do now, is draw the rectangles and add the names.
Adding the bar chart and baby name
In the previous section, we added the g
elements and assigned those to the bars
variable. In this section, we're going to calculate the width of the individual rectangles and add those and some text to the g
:
var yScale = d3.scaleLinear()
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);
bars.append('rect')
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });
bars.append("text")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });
Here we see something new: the d3.scaleLinear
function. With a d3.scaleLinear
, we can let D3 calculate how the number of times a name was given (the amount
property) maps to a specific width. We want to use the full width (width
property, which has a value of 720
) of the chart for our bars, so that would mean that the highest value in our input data should map to that value:
- The name
Emma
, which occurred 20355
times, should map to a value of 720
- The name
Olivia
, which occurred 19553
times, should map to a value of 720 * (19553/20355)
- The name
Mia
, which occurred 14820
times, should map to a value of 720 * (14820/20355)
- And so on...
Now, we could calculate this ourselves and set the size of the rect
accordingly, but using the d3.scaleLinear
is much easier, and provides additional functionality. Let's look at the definition a bit closer:
var yScale = d3.scaleLinear()
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);
What we do here, is we define a linear scale, whose input domain is set from 0
to the maximum amount in our data. This input domain is mapped to an output range starting at 0
and ending at width
. The result, yScale
, is a function which we can now use to map the input domain to the output range: for example, yScale(1234)
returns 43.64922623434046
.
Once you've got a scale, you can use a couple of functions to change its behavior:
This is just a small part of the scales support provided by D3. In the rest of the book, we'll explore more of the scales options that are available.
With the scale defined, we can use that to create our rect
and text
elements in the same way we did in our previous example:
bars.append('rect')
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });
Here we create a rect
with a fixed height, and a width which is defined by the yScale
and the number of times the name was used. We also add a class to the rect
so that we can set its colors (and other styling attributes) through CSS. In the case where sex
is F
, we set the class female
and in the other case we set the class male
.
To position the text
element, we do pretty much the same:
bars.append("text")
.attr("class", "label")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });
We create a new text
element, position it at the end of the bar, set a custom CSS class, and finally set its value to d.name
. The dy
attribute might seem a bit strange, but this allows us to position the text nicely in the middle of the bar chart. If we opened the example at this point, we'd see something like this:
We can see that all the information is in there, but it still looks kind of ugly. In the following section, we add some CSS to improve what the chart looks like.
Adding some CSS classes to style the bars and text elements
When we added the rect
elements, we added a female
class attribute for the girls' names, and a male
one for the boys' names and we've also set the style of our text elements to label
. In our CSS file, we can now define colors and other styles based on these classes:
.male {
fill: steelblue;
}
.female {
fill: hotpink;
}
.label {
fill: black;
font: 10px sans-serif;
text-anchor: end;
}
With these CSS properties, we set the fill
color of our rectangles. The elements with the male
class will be filled steelblue
and the elements with the female
class will be filled hotpink
. We also change how the elements with the .label
class are rendered. For these elements, we change the font
and the text-anchor
. The text-anchor
, especially, is important here, since it makes sure that the text
element's right side is positioned at the x
and y
value, instead of the left side. The effect is that the text
element is nicely aligned at the end of our bars.
Adding the axis on the top and bottom
The final step we need to take to get the figure from the beginning of this section is to add the top and bottom axes. D3 provides you with a d3.axis<orientation>
function, which allows you to create an axis at the bottom, top, left, or right side. When creating an axis, we pass in a scale (which we also used for the width of the rectangles), and tell D3 how the axis should be formatted. In this case, we want 20 ticks, and use the s
formatting, which tells D3 to use the international system of units (SI).This means that D3 will use metric prefixes to format the tick values (more info can be found here: https://en.wikipedia.org/wiki/Metric_prefix).
var bottomAxis = d3.axisBottom().scale(yScale).ticks(20, "s");
var topAxis = d3.axisTop().scale(yScale).ticks(20, "s");
chart.append("g")
.attr('transform', 'translate( 0 ' + both.length * (barWidth + barMargin) + ')')
.call(bottomAxis);
chart.append("g")
.attr('transform', 'translate( 0 ' + -barMargin + ' )')
.call(topAxis);
And with that, we've recreated the example we saw at the beginning of this section:
If you look back at the code we showed at the beginning of this section, you can see that we only need a small number of lines of code to create a nice visualization.