Packt+ | Advance your knowledge in tech

You're reading from Scala Machine Learning Projects Build real-world machine learning and deep learning projects with Scala

Product type Paperback

Published in Jan 2018

Publisher Packt

ISBN-13 9781788479042

Length 470 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Deep Learning

Author (1):

Karim

View More author details

Table of Contents (17) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Analyzing Insurance Severity Claims FREE CHAPTER

2. Analyzing and Predicting Telecommunication Churn

3. High Frequency Bitcoin Price Prediction from Historical and Live Data

4. Population-Scale Clustering and Ethnicity Prediction

5. Topic Modeling - A Better Insight into Large-Scale Texts

6. Developing Model-based Movie Recommendation Engines

7. Options Trading Using Q-learning and Scala Play Framework

8. Clients Subscription Assessment for Bank Telemarketing using Deep Neural Networks

9. Fraud Analytics Using Autoencoders and Anomaly Detection

10. Human Activity Recognition using Recurrent Neural Networks

11. Image Classification using Convolutional Neural Networks

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Summary

In this chapter, we saw how to interoperate with a few big data tools such as Spark, H2O, and ADAM for handling a large-scale genomics dataset. We applied the Spark-based K-means algorithm to genetic variants data from the 1000 Genomes project analysis, aiming to cluster genotypic variants at the population scale.

Then we applied an H2O-based DL algorithm and Spark-based Random Forest models to predict geographic ethnicity. Additionally, we learned how to install and configure H2O for DL. This knowledge will be used in later chapters. Finally and importantly, we learned how to use H2O to compute variable importance in order to select the most important features in a training set.

In the next chapter, we will see how effectively we can use the Latent Dirichlet Allocation (LDA) algorithm for finding useful patterns in data. We will compare other topic modeling algorithms and the scalability power of LDA. In addition, we will utilize Natural Language Processing (NLP) libraries such as...