Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Mastering Machine Learning with R

You're reading from   Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Arrow left icon
Product type Paperback
Published in Oct 2015
Publisher
ISBN-13 9781783984527
Length 400 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
 Lesmeister Lesmeister
Author Profile Icon Lesmeister
Lesmeister
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Mastering Machine Learning with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. A Process for Success FREE CHAPTER 2. Linear Regression – The Blocking and Tackling of Machine Learning 3. Logistic Regression and Discriminant Analysis 4. Advanced Feature Selection in Linear Models 5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines 6. Classification and Regression Trees 7. Neural Networks 8. Cluster Analysis 9. Principal Components Analysis 10. Market Basket Analysis and Recommendation Engines 11. Time Series and Causality 12. Text Mining R Fundamentals Index

Data frames and matrices


We will now create a data frame, which is a collection of variables (vectors). We will create a vector of 1, 2, and 3 and another vector of 1, 1.5, and 2.0. Once this is done, the rbind() function will allow us to combine the rows:

> p = seq(1:3)

> p
[1] 1 2 3

> q = seq(1,2, by=0.5)

> q
[1] 1.0 1.5 2.0

> r = rbind(p,q)

> r
  [,1] [,2] [,3]
p    1  2.0    3
q    1  1.5    2

The result is a list of two rows with three values each. You can always determine the structure of your data using the str() function, which in this case, shows us that we have two lists, one named p and the other, q:

> str(r)
 num [1:2, 1:3] 1 1 2 1.5 3 2
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "p" "q"
  ..$ : NULL

Now, let's put them together as columns using cbind():

> s = cbind(p,q)

> s
     p   q
[1,] 1 1.0
[2,] 2 1.5
[3,] 3 2.0

To put this in a data frame, use the as.data.frame() function. After that, examine the structure:

> s = as.data.frame(s)

> str(s)
'data.frame':3 obs. of  2 variables:
 $ p: num  1 2 3
 $ q: num  1 1.5 2

We now have a data frame, (s), that has two variables of three observations each. We can change the names of the variables using names():

> names(s) = c("column 1", "column 2")

> s
  column 1 column 2
1        1      1.0
2        2      1.5
3        3      2.0

Let's have a go at putting this into a matrix format with as.matrix(). In some packages, R will require the analysis to be done on a data frame, but in others, it will require a matrix. You can switch back and forth between a data frame and matrix as you require:

> t= as.matrix(s)

> t
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5
[3,]        3      2.0

One of the things that you can do is check whether a specific value is in a matrix or data frame. For instance, we want to know the value of the first observation and first variable. In this case, we will need to specify the first row and first column in brackets as follows:

> t[1,1]
column 1 
       1

Let's assume that you want to see all the values in the second variable (column). Then, just leave the row blank but remember to use a comma before the column(s) that you want to see:

> t[,2]
[1] 1.0 1.5 2.0

Conversely, let's say we want to look at the first two rows only. In this case, just use a colon symbol:

> t[1:2,]
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5

Assume that you have a data frame or matrix with 100 observations and ten variables and you want to create a subset of the first 70 observations and variables 1, 3, 7, 8, 9, and 10. What would this look like?

Well, using the colon, comma, concatenate function, and brackets you could simply do the following:

> new = old[1:70, c(1,3,7:10)]

Notice how you can easily manipulate what observations and variables you want to include. You can also easily exclude variables. Say that we just want to exclude the first variable; then you could do the following using a negative sign for the first variable:

> new = old[,-1]

This syntax is very powerful in R for the fundamental manipulation of data. In the main chapters, we will also bring in more advanced data manipulation techniques.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images