Subscription

All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Graph Analytics with Neo4j Perform graph processing and visualization techniques using connected data across your enterprise

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781839212611

Length 510 pages

Edition 1st Edition

Languages

Cypher

Tools

Neo4j

Concepts

Database Programming

Author (1):

Scifo

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1: Graph Modeling with Neo4j

2. Graph Databases FREE CHAPTER

3. The Cypher Query Language

4. Empowering Your Business with Pure Cypher

5. Section 2: Graph Algorithms

6. The Graph Data Science Library and Path Finding

7. Spatial Data

8. Node Importance

9. Community Detection and Similarity Measures

10. Section 3: Machine Learning on Graphs

11. Using Graph-based Features in Machine Learning

12. Predicting Relationships

13. Graph Embedding - from Graphs to Matrices

14. Section 4: Neo4j for Production

15. Using Neo4j in Your Web Application

16. Neo4j at Scale

17. Other Books You May Enjoy

Leave a review - let other readers know what you think

Creating the train and test samples with scikit-learn

Splitting the data into a training and a testing set is not simple. Both the training and the testing sets have to be representative of the full dataset. If your dataset contains apartment sizes ranging from 15 to 200 square meters, it is probably not a good idea to use the observations that have an area lower than 50 square meters as the training set and use the rest as the testing set. This would not work because both the train and the test samples must contain areas from the whole range. Randomly splitting the data is often sufficient and results in a good representation of the features in both sets.

However, some situations do require a different approach and we should take these into consideration – for example, when the target variable (or any of the categorical features) is unbalanced, meaning some classes are predominant. In this case, we need to make sure both the train and the test samples respect the same class repartitions...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (1)

Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.

See other products by Scifo