Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with Spark Develop intelligent, distributed machine learning systems

Product type Paperback

Published in Apr 2017

Publisher Packt

ISBN-13 9781785889936

Length 532 pages

Edition 2nd Edition

Languages

Scala

Tools

Apache Spark

Concepts

Machine Learning

Authors (2):

Dua

Ghotra

View More author details

Table of Contents (19) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. Getting Up and Running with Spark FREE CHAPTER

2. Math for Machine Learning

3. Designing a Machine Learning System

4. Obtaining, Processing, and Preparing Data with Spark

5. Building a Recommendation Engine with Spark

6. Building a Classification Model with Spark

7. Building a Regression Model with Spark

8. Building a Clustering Model with Spark

9. Dimensionality Reduction with Spark

10. Advanced Text Processing with Spark

11. Real-Time Machine Learning with Spark Streaming

12. Pipeline APIs for Spark ML

Configuring and running Spark on Amazon Elastic Map Reduce

Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:

Launch an Amazon EMR Cluster.
Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
Choose Create cluster:

Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:

For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
Select other hardware options as necessary:
- The Instance Type
- The keypair to be used with SSH
- Permissions
- IAM roles (Default orCustom)

Refer to the following screenshot:

Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:

   $ ssh -i rd_spark-user1.pem
   [email protected]

The output will be similar to following listing:

     Last login: Wed Jan 13 10:46:26 2016

          __|  __|_  )
          _|  (     /   Amazon Linux AMI
         ___|___|___|

     https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
     23 package(s) needed for security, out of 49 available
     Run "sudo yum update" to apply all updates.
     [hadoop@ip-172-31-2-31 ~]$

Start the Spark Shell:

      [hadoop@ip-172-31-2-31 ~]$ spark-shell
      16/01/13 10:49:36 INFO SecurityManager: Changing view acls to: 
          hadoop
      16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to: 
          hadoop
      16/01/13 10:49:36 INFO SecurityManager: SecurityManager: 
          authentication disabled; ui acls disabled; users with view 
          permissions: Set(hadoop); users with modify permissions: 
          Set(hadoop)
      16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
      16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP 
          class server' on port 60523.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _ / _ / _ &grave;/ __/  '_/
         /___/ .__/_,_/_/ /_/_   version 1.5.2
            /_/
      scala> sc

Run Basic Spark sample from the EMR:

    scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
      /hive-ads/tables/impressions/dt=2009-04-13-08-05
      /ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")
   scala> val linesWithCartoonNetwork = textFile.filter(line =>  
      line.contains("cartoonnetwork.com")).count()

Your output will be as follows:

     linesWithCartoonNetwork: Long = 9

You're reading from Machine Learning with Spark Develop intelligent, distributed machine learning systems

Table of Contents (19) Chapters

Configuring and running Spark on Amazon Elastic Map Reduce

Authors (2)

Other recommended products

Personalised recommendations for you