Packt+ | Advance your knowledge in tech

You're reading from Apache Spark 2.x Machine Learning Cookbook Over 100 recipes to simplify machine learning model implementations with Spark

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781783551606

Length 666 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Machine Learning

Authors (5):

Amirghodsi

Mohammed Guller

Shuen Mei

Rajendran

Hall

+1 more

View More author details

Table of Contents (20) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. Practical Machine Learning with Spark Using Scala FREE CHAPTER

2. Just Enough Linear Algebra for Machine Learning with Spark

3. Spark's Three Data Musketeers for Machine Learning - Perfect Together

4. Common Recipes for Implementing a Robust Machine Learning System

5. Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I

6. Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II

7. Recommendation Engine that Scales with Spark

8. Unsupervised Clustering with Apache Spark 2.0

9. Optimization - Going Down the Hill with Gradient Descent

10. Building Machine Learning Systems with Decision Tree and Ensemble Models

11. Curse of High-Dimensionality in Big Data

12. Implementing Text Analytics with Spark 2.0 ML Library

13. Spark Streaming and Machine Learning Library

Exploring ML pipelines and DataFrames using logistic regression in Spark 2.0

We have gone out of our way to present the in detail and as as possible so you get started without the additional syntactic sugar that Scala uses.

Getting ready

In this recipe, we combine the ML pipelines and logistic regression to demonstrate how you can combine various steps in a single pipeline that operates on DataFrames as they get transformed and travel through the pipe. We skip some of the steps, such as splitting the data and model evaluation, and reserve them for later chapters to make the program shorter, but provide a full treatment of pipeline, DataFrame, estimators, and transformers in a single recipe.

This recipe explores the details of the pipeline and DataFrames as they travel through the pipeline and get operated on.