Packt+ | Advance your knowledge in tech

You're reading from Jupyter for Data Science Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter

Product type Paperback

Published in Oct 2017

Publisher Packt

ISBN-13 9781785880070

Length 242 pages

Edition 1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1):

Toomey

View More author details

Table of Contents (17) Chapters

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

1. Jupyter and Data Science

2. Working with Analytical Data on Jupyter FREE CHAPTER

3. Data Visualization and Prediction

4. Data Mining and SQL Queries

5. R with Jupyter

6. Data Wrangling

7. Jupyter Dashboards

8. Statistical Modeling

9. Machine Learning Using Jupyter

10. Optimizing Jupyter Notebooks

Chapter 4. Data Mining and SQL Queries

PySpark exposes the Spark programming model to Python. Spark is a fast, general engine for large-scale data processing. We can use Python under Jupyter. So, we can use Spark in Jupyter.

Installing Spark requires the following components to be installed on your machine:

Java JDK.
Scala from http://www.scala-lang.org/download/.
Python recommend downloading Anaconda with Python (from http://continuum.io).
Spark from https://spark.apache.org/downloads.html.
winutils: This is a command-line utility that exposes Linux commands to Windows. There are 32-bit and 64-bit versions available at:
- 32-bit winutils.exe at https://code.google.com/p/rrd-hadoop-win32/source/checkout
- 64-bit winutils.exe at https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin

Then set environment variables that show the position of the preceding components:

JAVA_HOME: The bin directory where you installed JDK
PYTHONPATH: Directory where Python was installed
HADOOP_HOME: Directory...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Jupyter for Data Science Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter

Table of Contents (17) Chapters

Chapter 4. Data Mining and SQL Queries

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Jupyter for Data Science Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter

Table of Contents (17) Chapters

Chapter 4. Data Mining and SQL Queries

Authors (1)

Other recommended products

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access