In this chapter, we will start by using a classical CSV file that we will use to review the different steps of a machine learning project, before enriching it to go forward with graph analysis.
The context is the following: during a conference centered around graphs, you submit a questionnaire to the attendees in order to learn more about them. Among the different questions, one of them is whether the user contributed directly to Neo4j. Unfortunately, not all of the participants answered that question but you would like to infer from the ones who gave an answer the status of the other ones. So, we have a situation with a supervised classification problem whose target categories are contributed to Neo4j or didn't contribute to Neo4j.
Whether this problem can be solved with data and statistical models depends on the availability and...