Packt+ | Advance your knowledge in tech

0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

Hands-On Data Science and Python Machine Learning

You're reading from Hands-On Data Science and Python Machine Learning Perform data mining and machine learning efficiently using Python and Spark

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787280748

Length 420 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Mining

Author (1):

Frank Kane

View More author details

Table of Contents (16) Chapters

Title Page

Credits

About the Author

www.PacktPub.com

Customer Feedback

Preface

1. Getting Started FREE CHAPTER

2. Statistics and Probability Refresher, and Python Practice

3. Matplotlib and Advanced Probability Concepts

4. Predictive Models

5. Machine Learning with Python

6. Recommender Systems

7. More Data Mining and Machine Learning Techniques

8. Dealing with Real-World Data

9. Apache Spark - Machine Learning on Big Data

10. Testing and Experimental Design

Introducing MLlib

Fortunately, you don't have to do things the hard way in Spark when you're doing machine learning. It has a built-in component called MLlib that lives on top of Spark Core, and this makes it very easy to perform complex machine learning algorithms using massive Datasets, and distributing that processing across an entire cluster of computers. So, very exciting stuff. Let's learn more about what it can do.

Some MLlib Capabilities

So, what are some of the things MLlib can do? Well, one is feature extraction.

One thing you can do at scale is term frequency and inverse document frequency stuff, and that's useful for creating, for example, search indexes. We will actually go through an example of that later in the chapter. The key, again, is that it can do this across a cluster using massive Datasets, so you could make your own search engine for the web with this, potentially. It also offers basic statistics functions, chi-squared tests, Pearson or Spearman correlation, and some...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (1)

Frank Kane

Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.

See other products by Frank Kane

Other recommended products

Related to this chapter

Frank Kane's Taming Big Data with Apache Spark and Python

Frank Kane's Taming Big Data with Apache Spark and Python

Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python.

Jun 2017 9h 52m

Getting Started with Haskell Data Analysis

Getting Started with Haskell Data Analysis

Data analysis is a part of computer science and part statistics. An important part of data analysis is validating your assumptions with real-world data to see if there is a pattern or a particular user behavior that you can validate. This book will help you get up to speed with the basics of data analysis and approaches in the Haskell language.

Oct 2018 5h 20m

Hands-On Recommendation Systems with Python

Hands-On Recommendation Systems with Python

Recommendation systems are at the heart of almost every internet business today; from Facebook to Netflix to Amazon. Providing good recommendations, whether it's friends, movies or groceries, goes a long way in defining user experience and enticing your customers to use and buy from your platform. This book teaches you to do just that.

Jul 2018 4h 52m

Supervised Machine Learning with Python

Supervised Machine Learning with Python

A supervised learning task infers a function from flagged training data and maps an input to an output based on sample input-output pairs. In this book, you will learn various machine learning techniques (such as linear and logistic regression) and gain the practical knowledge you need to quickly and powerfully apply algorithms to new problems.

May 2019 5h 24m

Training Systems using Python Statistical Modeling

Training Systems using Python Statistical Modeling

This book will acquaint you with various aspects of statistical analysis in Python. You will work with different types of prediction models, such as decision trees, random forests and neural networks. By the end of this book, you will be confident in using various Python packages to train your own models for effective machine learning.

May 2019 9h 40m

Hands-On Exploratory Data Analysis with Python

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

Mar 2020 11h 44m

Essential Statistics for Non-STEM Data Analysts

Essential Statistics for Non-STEM Data Analysts

Put your data science knowledge to work with this practical guide to statistics. You'll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

Nov 2020 13h 4m

Personalised recommendations for you

Based on your interests and search pattern

Mathematics of Machine Learning

Mathematics of Machine Learning

Deepen your theoretical knowledge and enhance your ability to solve complex machine learning problems with structured guidance. Gain the confidence to engage with advanced ML literature and tailor algorithms to meet your project requirements.

May 2025 24h 20m

Generative AI with Python and PyTorch

Generative AI with Python and PyTorch

Learn how to create images and text using VAEs, GANs, LSTMs, and transformers. Implement applications in natural language processing and computer vision through practical tutorials.

Mar 2025 15h 8m

Practical Generative AI with ChatGPT

Practical Generative AI with ChatGPT

This book helps you unlock ChatGPT's potential to make your working life better. From prompt engineering to creating custom GPTs, you'll enhance your productivity, creativity, and efficiency with practical insights and advanced techniques.

Apr 2025 13h 12m

Generative AI with LangChain

Generative AI with LangChain

Gain a solid foundation in LangChain, agentic AI, and LangGraph, and learn to build production-ready systems with multi-agent architectures, advanced RAG pipelines, Tree of Thought reasoning, agent handoffs, and fine-grained error handling.

May 2025 16h 8m

Architecting Power BI Solutions in Microsoft Fabric

Architecting Power BI Solutions in Microsoft Fabric

Power BI provides several options to solve common data problems, and designing the correct solution for each scenario can be a daunting task. This book makes it easier by guiding you through designing optimal solutions using Power BI.

Apr 2025 14h 24m

Microsoft Identity and Access Administrator SC-300 Exam Guide

Microsoft Identity and Access Administrator SC-300 Exam Guide

This comprehensive guide covers key topics such as Microsoft Entra ID implementation, authentication and access management, external user management, and hybrid identity solutions, providing practical insights and techniques for SC-300 exam success.

Mar 2025 19h 48m

LLM Design Patterns

LLM Design Patterns

This book helps you gain practical skills to develop and deploy LLMs. You'll learn data prep, training, pruning, quantization, and evaluation, as well as explore RAG, advanced prompting, and optimization to build robust, scalable language models.

May 2025 17h 56m

Tableau Cookbook for Experienced Professionals

Tableau Cookbook for Experienced Professionals

Advance your Tableau knowledge beyond the basics, streamline dashboard performance, tackle advanced geospatial challenges, and unlock API potential while fortifying your corporate data infrastructure with proven best practices.

Apr 2025 12h 24m

Time Series Analysis with Spark

Time Series Analysis with Spark

This book offers a complete guide to time series analysis with Apache Spark and Databricks, covering essential concepts and advanced techniques including Generative AI to equip readers with skills for real-world challenges across industries.

Mar 2025 10h 4m

Hands-On Artificial Intelligence for IoT

Hands-On Artificial Intelligence for IoT

Transform IoT systems with the power of artificial intelligence using this hands-on guide. Dive into practical techniques and expert insights to innovate and optimize your IoT devices, making them smarter and more efficient.

May 2025 15h 52m