Subscription

All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Graph Analytics with Neo4j Perform graph processing and visualization techniques using connected data across your enterprise

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781839212611

Length 510 pages

Edition 1st Edition

Languages

Cypher

Tools

Neo4j

Concepts

Database Programming

Author (1):

Scifo

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1: Graph Modeling with Neo4j

2. Graph Databases FREE CHAPTER

3. The Cypher Query Language

4. Empowering Your Business with Pure Cypher

5. Section 2: Graph Algorithms

6. The Graph Data Science Library and Path Finding

7. Spatial Data

8. Node Importance

9. Community Detection and Similarity Measures

10. Section 3: Machine Learning on Graphs

11. Using Graph-based Features in Machine Learning

12. Predicting Relationships

13. Graph Embedding - from Graphs to Matrices

14. Section 4: Neo4j for Production

15. Using Neo4j in Your Web Application

16. Neo4j at Scale

17. Other Books You May Enjoy

Leave a review - let other readers know what you think

Plotting the ROC curve

First of all, we should split our dataset into train and test samples, respecting the class repartitions in both samples:

from sklearn.model_selection import train_test_split

X = df[["score"]]
y = df.label
X_train, X_test, y_train, y_test = train_test_split(
        X, y, 
        test_size=0.2,
        random_state=42, 
        # make sure both the train and test samples are representative
        # of the whole dataset in terms of class unbalance
        stratify=y
)

As we noticed earlier, our dataset is unbalanced. We can use some sampling techniques to restore class balance in the training set:

from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(random_state=SEED)
X_train, y_train = rus.fit_resample(X_train, y_train)

In order to compute FPR and TPR at different thresholds, we will use a scikit-learn function that will do so for us:

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_train, X_train.score)

To plot...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Authors (1)

Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.

See other products by Scifo