Packt+ | Advance your knowledge in tech

You're reading from Scala for Data Science Leverage the power of Scala with different tools to build scalable, robust data science applications

Product type Paperback

Published in Jan 2016

Publisher

ISBN-13 9781785281372

Length 416 pages

Edition 1st Edition

Languages

Scala

Concepts

Application Development

Author (1):

Bugnion

View More author details

Table of Contents (22) Chapters

Scala for Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Scala and Data Science FREE CHAPTER

2. Manipulating Data with Breeze

3. Plotting with breeze-viz

4. Parallel Collections and Futures

5. Scala and SQL through JDBC

6. Slick – A Functional Interface for SQL

7. Web APIs

8. Scala and MongoDB

9. Concurrency with Akka

10. Distributed Batch Processing with Spark

11. Spark SQL and DataFrames

12. Distributed Machine Learning with MLlib

13. Web APIs with Play

14. Visualization with D3 and the Play Framework

Pattern Matching and Extractors

Index

Chapter 10. Distributed Batch Processing with Spark

In Chapter 4, Parallel Collections and Futures, we discovered how to use parallel collections for "embarrassingly" parallel problems: problems that can be broken down into a series of tasks that require no (or very little) communication between the tasks.

Apache Spark provides behavior similar to Scala parallel collections (and much more), but, instead of distributing tasks across different CPUs on the same computer, it allows the tasks to be distributed across a computer cluster. This provides arbitrary horizontal scalability, since we can simply add more computers to the cluster.

In this chapter, we will learn the basics of Apache Spark and use it to explore a set of emails, extracting features with the view of building a spam filter. We will explore several ways of actually building a spam filter in Chapter 12, Distributed Machine Learning with MLlib.

The rest of the chapter is locked

You're reading from Scala for Data Science Leverage the power of Scala with different tools to build scalable, robust data science applications

Table of Contents (22) Chapters

Chapter 10. Distributed Batch Processing with Spark

Authors (1)

Personalised recommendations for you

You're reading from Scala for Data Science Leverage the power of Scala with different tools to build scalable, robust data science applications

Table of Contents (22) Chapters

Chapter 10. Distributed Batch Processing with Spark

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you