Packt+ | Advance your knowledge in tech

You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Product type Paperback

Published in Oct 2015

Publisher

ISBN-13 9781783984527

Length 400 pages

Edition 1st Edition

Languages

Tools

RStudio

Concepts

Machine Learning

Author (1):

Lesmeister

View More author details

Table of Contents (20) Chapters

Mastering Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. A Process for Success FREE CHAPTER

2. Linear Regression – The Blocking and Tackling of Machine Learning

3. Logistic Regression and Discriminant Analysis

4. Advanced Feature Selection in Linear Models

5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines

6. Classification and Regression Trees

7. Neural Networks

8. Cluster Analysis

9. Principal Components Analysis

10. Market Basket Analysis and Recommendation Engines

11. Time Series and Causality

12. Text Mining

R Fundamentals

Index

Data frames and matrices

We will now create a data frame, which is a collection of variables (vectors). We will create a vector of 1, 2, and 3 and another vector of 1, 1.5, and 2.0. Once this is done, the rbind() function will allow us to combine the rows:

> p = seq(1:3)

> p
[1] 1 2 3

> q = seq(1,2, by=0.5)

> q
[1] 1.0 1.5 2.0

> r = rbind(p,q)

> r
  [,1] [,2] [,3]
p    1  2.0    3
q    1  1.5    2

The result is a list of two rows with three values each. You can always determine the structure of your data using the str() function, which in this case, shows us that we have two lists, one named p and the other, q:

> str(r)
 num [1:2, 1:3] 1 1 2 1.5 3 2
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "p" "q"
  ..$ : NULL

Now, let's put them together as columns using cbind():

> s = cbind(p,q)

> s
     p   q
[1,] 1 1.0
[2,] 2 1.5
[3,] 3 2.0

To put this in a data frame, use the as.data.frame() function. After that, examine the structure:

> s = as.data.frame(s)

> str(s)
'data.frame':3 obs. of  2 variables:
 $ p: num  1 2 3
 $ q: num  1 1.5 2

We now have a data frame, (s), that has two variables of three observations each. We can change the names of the variables using names():

> names(s) = c("column 1", "column 2")

> s
  column 1 column 2
1        1      1.0
2        2      1.5
3        3      2.0

Let's have a go at putting this into a matrix format with as.matrix(). In some packages, R will require the analysis to be done on a data frame, but in others, it will require a matrix. You can switch back and forth between a data frame and matrix as you require:

> t= as.matrix(s)

> t
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5
[3,]        3      2.0

One of the things that you can do is check whether a specific value is in a matrix or data frame. For instance, we want to know the value of the first observation and first variable. In this case, we will need to specify the first row and first column in brackets as follows:

> t[1,1]
column 1 
       1

Let's assume that you want to see all the values in the second variable (column). Then, just leave the row blank but remember to use a comma before the column(s) that you want to see:

> t[,2]
[1] 1.0 1.5 2.0

Conversely, let's say we want to look at the first two rows only. In this case, just use a colon symbol:

> t[1:2,]
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5

Assume that you have a data frame or matrix with 100 observations and ten variables and you want to create a subset of the first 70 observations and variables 1, 3, 7, 8, 9, and 10. What would this look like?

Well, using the colon, comma, concatenate function, and brackets you could simply do the following:

> new = old[1:70, c(1,3,7:10)]

Notice how you can easily manipulate what observations and variables you want to include. You can also easily exclude variables. Say that we just want to exclude the first variable; then you could do the following using a negative sign for the first variable:

> new = old[,-1]

This syntax is very powerful in R for the fundamental manipulation of data. In the main chapters, we will also bring in more advanced data manipulation techniques.

The rest of the chapter is locked

You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Table of Contents (20) Chapters

Data frames and matrices

Authors (1)

Personalised recommendations for you

You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Table of Contents (20) Chapters

Data frames and matrices

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you